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ABSTRACT 



This thesis develops and expands upon known techniques of mathematical 
physics relevant to the analysis of the popular Markov model of phylogenetic 
trees required in biology to reconstruct the evolutionary relationships of tax- 
onomic units from biomolecular sequence data. 

The techniques of mathematical physics are plethora and have been developed 
for some time. The Markov model of phylogenetics and its analysis is a rela- 
tively new technique where most progress to date has been achieved by using 
discrete mathematics. This thesis takes a group theoretical approach to the 
problem by beginning with a remarkable mathematical parallel to the process 
of scattering in particle physics. This is shown to equate to branching events 
in the evolutionary history of molecular units. The major technical result of 
this thesis is the derivation of existence proofs and computational techniques 
for calculating polynomial group invariant functions on a multi-linear space 
where the group action is that relevant to a Markovian time evolution. The 
practical results of this thesis are an extended analysis of the use of invariant 
functions in distance based methods and the presentation of a new recon- 
struction technique for quartet trees which is consistent with the most general 
Markov model of sequence evolution. 
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Chapter 1 



Introduction 



The rationale of this thesis is taken from a remarkable analogy between the 
stochastic models used to infer phylogenetic relationships in mathematical 
biology and the structure of multiparticle quantum physics. There is a di- 
rect relationship between Feynman diagrams that describe the interactions 
of sub-atomic particles and phylogenetic trees that graphically represent the 
evolutionary relationship between taxonomic units. A Feynman diagram gives 
the graphical representation of creation and annihilation events of particle in- 
teractions. A taxonomic unit may be any biomolecular unit such as a gene, 
an amino acid or base pair, and the time evolution of these molecular units is 
modelled stochastically under a Markov assumption. Techniques which recon- 
struct the evolutionary history of molecular units from present observations 
are based on these models. Given the correct framework, these Markov mod- 
els and the formalism of multiparticle quantum mechanics can be put into 
a mathematical correspondence. This is a very useful observation because 
phylogenetics is a relatively new mathematical problem (for example see the 
classic paper by Felsenstein [Tj5]) whereas the mathematics of particle physics 
has been studied for over a century. (For an outstanding introduction to the 
history of theoretical particle physics see [17], and for a comprehensive intro- 
duction to mathematical physics see [61].) Given that there is a mathematical 
connection between the two problems it would certainly be unfortunate to 
see results that have been obtained in physics re-derived independently in the 
context of phylogenetics. This thesis looks at a particular aspect of quantum 
systems known as entanglement and shows that measures of entanglement can 
be utilized to improve the reconstruction of phylogenetic relationships. 
We will need to be clear that the probabilities associated with quantum sys- 
tems and those of phylogenetic models arise in quite a different scientific way. 
Quantum mechanics is a probabilistic theory because the theoretical predic- 
tions give the correct statistical behaviour regarding the outcomes of particular 
experiments. The theoretical predictions can be used to infer (incredibly accu- 
rately) the distribution of results for many repetitions of the same experiment. 
(For a popular discussion of the amazing accuracy of quantum theory see Feyn- 
man's discussion of the magnetic moment on the electron as predicted from 
quantum electrodynamics [22] .) Since quantum theory is (and should be) seen 
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as a theory of nature there has been argument for many decades on how to 
interpret this probabilistic aspect of quantum theory. This argument raises 
quite profound scientific and philosophical issues which, thankfully, we will 
not be concerned with in this thesis. Models of phylogenetics are exactly 
that - models, and should not be seen as being theories of nature. No one 
would argue that the time evolution of molecular units follow the Markov 
model of phylogenetics in detail, but rather that these models are the best 
(tractable) approximation that give us recourse to establishing properties of 
phylogenetic history. Primarily the points of interest are the branching struc- 
ture of the evolutionary history and also the evolutionary distance (or time) 
between branching events. 

After we have made the mathematical analogy between quantum theory and 
the Markov model of phylogenetics, we will concentrate on only a small part 
of what can be done using techniques known in mathematical physics. We 
will focus on the study of entanglement invariants and their generalization to 
the phylogenetic case [59], [60] . There is potential for concentrating on other 
techniques such as Lie algebra symmetries [6] and the analysis of the path 
integral formulation [HDE2], but these techniques will not be explored here. 
The distance based technique has been used in phylogenetics as a tree build- 
ing algorithm following the discovery that it is possible to calculate a distance 
from the observed sequences that is consistent with the Markov model. This 
distance function is a well defined mathematical object known as a group in- 
variant function and is used in quantum physics to quantify and test for the 
phenomenon of entanglement. Entanglement is a general property that can 
exist in many different physical systems and the invariant function used as a 
distance measure in phylogenetics is used to quantify entanglement for only 
the most elementary case. Hence, it seems astute to investigate what the next 
most complicated types of entanglement correspond to in phylogenetics. 

Theoretical outcomes of the thesis 

We present a group representation theoretic analysis of the Markov model 
of phylogenetic trees. Specifically this formalism is used to construct all the 
one-dimensional representations of the (appropriately defined) Markov semi- 
group. These one-dimensional representations occur as polynomials in the 
(discrete) probability distributions predicted from the Markov model which 
we coin Markov invariants. We establish the connection between these one- 
dimensional representations and that of phylogenetic invariants [TT1 [T5| 1201 155] 
and pairwise distance measures [231 HU] . This representation theoretical ap- 
proach touches upon existing techniques and can be incorporated into known 
algorithms to give novel results and insights to the problem of phylogenetic 
reconstruction. The main theoretical outcome of the thesis is this use of rep- 
resentation theory. We will also develop the theory of invariants of the general 
linear group on a tensor product space and show how to infer existence of these 
invariants in different cases. We develop a procedure for computing the ex- 
plicit form of these invariant functions, firstly developed for the general linear 
group and then generalized to the Markov semigroup. 
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Practical outcomes of the thesis 

We study a group invariant function, well known in quantum physics as the 
tangle, in the context of phylogenetics. The tangle is used in physics to give 
a measure of the amount of entanglement between three qubits. Qubits are 
two state objects in quantum physics and correspond in phylogenetics to a 
probability distribution on two states. In phylogenetics the classic example is 
to use the DNA as a state space and hence the case of four state objects is 
of interest. To this end we have generalized the tangle to the case of three 
and four character states. This is a new result that to the best of the author's 
knowledge was previously unknown. Having successfully generalized the tangle 
we investigate how the tangle can be used to construct improved phylogenetic 
distance matrices. Additionally we study a set of Markov invariants which 
exist for the case of phylogenetic quartet tree. In the case of the evolution 
of four taxa there are three possible historical evolutionary relationships. We 
show that these Markov invariants can be used to distinguish these three cases 
under the assumption of the most general Markov model. It is expected that 
the use of the tangle to construct distance matrices and using the Markov 
invariants to distinguish the three possible quartets will lead to improvements 
of the reconstruction of phylogenetic relationships from observed biomolecular 
data. 



Structure of the thesis 

Chapter [5] begins by introducing the mathematical material needed to under- 
stand the results presented in this thesis. This includes a short introduction to 
group representation theory, group characters and tensor product; a presenta- 
tion of the Schur/Weyl duality and the Schur functions; a definition of group 
invariant functions and their relation to one-dimensional representations. The 
chapter ends with several relevant examples of invariants of the general linear 
group. 

Chapter [3] begins with a light speed introduction to the formalism of quantum 
mechanics, the concept of entanglement and mathematical analysis thereof 
using group invariant functions. The Markov model of phylogenetic trees is 
then developed in its usual presentation, followed by a change of formalism 
which makes apparent the analogy between phylogenetic trees and multiparti- 
cle quantum systems. The chapter ends with a detailed analysis of the mathe- 
matical analysis of the invariant functions when evaluated upon a phylogenetic 
tree. 

Chapter H] gives a review of phylogenetic distance measures and shows how 
the tangle invariant function used to analyse three qubit entanglement can 
be generalized to the phylogenetic case and used to improve popular distance 
measures. This is done by defining the branch lengths of a phylogenetic tree, 
reviewing the standard measure known as the log det and then using the tangle 
invariant to give a consistent distance measure for the case of quartets. 
Chapter returns to the mathematical detail of Chapter [2] and derives in- 
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variant functions that are more closely relevant to the Markov model of a 
phylogenetic tree. This is done by first defining the Markov semigroup. The 
invariant functions of the general linear group are rederived using a technique 
which is generalized to derive the Markov invariants. Finally we examine 
the structure of the Markov invariants on a phylogenetic tree. In particular 
we concentrate on the quartet case where there exists four Markov invariants 
which can be used to distinguish between the three possible quartet trees. 



Chapter 2 



Mathematical background 



In this chapter we will present the requisite mathematical background for 
developing the results presented in this thesis. It will be assumed that the 
reader is familiar with elementary concepts of algebra, most importantly the 
theory of groups and finite dimensional vector spaces (for example see [28J) and 
the theory of Lie groups and the classical groups (see [12])- The presentation 
will be brief and the reader interested in proofs is referred to the relevant 
literature as the discussion progresses. Our aim is to show how representation 
theory of groups - most notably the Schur/Weyl duality - can be used to count 
and construct the group invariant functions on a multi-linear (tensor product) 
space. We will develop some explicit invariants for the general linear group 
using a method which is known intuitively to many mathematical physicists 
and we formalize the technique. 



2.1 Group representations 

Throughout this thesis we will be interested in the vector spaces C n and W 1 . 
Almost all of the results presented will be equally valid whether one considers 
the complex or real space. Hence, we will simply refer to the vector space V, 
making the distinction between the real and complex case only when confusion 
may arise. For proofs of theorems that will be presented and further discussion 
of group representation theory the reader is referred to the excellent texts 
[271 [351112]. 

Definition 2.1.1. A group representation p on the vector space V is a ho- 
momorphism from a group G to the set of invertible, linear transformations 
GL(V). The image element of g G G is denoted by p(g) and the dimension 
of the representation is taken to be the dimension of the corresponding vector 
space. 

A simple example of a group representation is constructed from the symmetric 
group on n elements, S n , by taking a given group element a 6 S n to simply 
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permute the basis vectors of the n dimensional vector space V: 

p(a)ei := e ai . 

It is clear that we have p{cra') = p{a)p(a') so that p is indeed a homomorphism 
from S n to GL(V). 

We will often be interested in the case where the abstract group is a matrix 
group such as the general linear group GL(V) which is, of course, defined by 
its action on the vector space V. To avoid confusion, we will refer to this 
representation as the defining representation. To increase confusion we will 
write elements of the defining representation simply as g. 
Given a matrix group G, there is always a one-dimensional representation 
defined by the determinant function: 

det : G -> C, 

where C* = C \ {0} is the group of multiplications of non-zero complex num- 
bers. The multiplicative property of the determinant: 

det(</i0 2 ) = det(flri) det(g 2 ), 

ensures that the determinant function defines a group homomorphism. 

Definition 2.1.2. A subspace U C V is invariant under the group represen- 
tation p if for all u £ U it follows that p(g)u £ U for all g £ G. 

The notion of invariant subspaces allows us to break a given representation into 
its essential parts. That is, we can simplify the representation by considering 
its action upon the invariant subspaces alone. 

Definition 2.1.3. A representation is reducible if there exists a non-trivial 
invariant subspace U. An irreducible representation is one which has no non- 
trivial invariant subspaces. A representation is decomposable if there exist 
non-trivial invariant subspaces U and W such that V = U © W, and indecom- 
posable otherwise. A representation is completely reducible if whenever there 
exists a non-trivial invariant subspace U, then there exists a second non-trivial 
invariant subspace W such that V = U © W. 

The matrix interpretation of a completely reducible representation is that there 
exists a basis where the matrix representation of each group element takes on 
a block-diagonal form. We will be exclusively interested in integral represen- 
tations of the general linear group and its subgroups. Integral representations 
are those in which the entries of the representation matrix are polynomials in 
the matrix entries of GL(V) with respect to a particular basis. The integral 
representations of GL(V) are completely reducible [33] . 

Definition 2.1.4. The representations p\ and p2 are said to be equivalent if 
there exists an invertible linear transformation S on V such that 

S Pl (g)S- 1 = P2 (g) 



for all g £ G. 
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From these considerations we can conclude that a given integral representation 
of the general linear group can be decomposed as 

P= ©Pa, 

a 

where each p a is an irreducible representation. 
2.1.1 Group characters 

Definition 2.1.5. The character of a representation p is defined as the trace 
function: 

x (g) = tr(p(g)). 

It follows immediately that the character is unaffected by similarity transfor- 
mations: 

tr(Sp(g)S- 1 ) = tr(p(g)S- 1 S) = tr(p(g)), 
and is hence the same for equivalent representations. 

The problem of classifying irreducible representations reduces to identifying 
the characters. Although the following result is valid only for finite groups, 
we will see that understanding the representation theory of S n (a finite group) 
is crucial to constructing the irreducible representations of GL(V) (an infinite 
group). 

2.1.6. For a finite group, the number of non- equivalent irreducible represen- 
tations of a group G is equal to the number of conjugacy classes of G. 

For example the conjugacy classes of the symmetric group can be found by 
considering the cycle notation which presents an element of S n as a product of 
disjoint cycles. The lengths of these cycles adds to n and hence we get the well 
known result that the conjugacy classes of S n are labelled by the partitions of 
n. (We will discuss partitions in more detail in the next section.) To illustrate 
this, consider that any element of the symmetric group can be written in the 
following form: 

a = (iii 2 . . . i ai )(jih • • -ja 2 ) ■ ■ ■ (hh ■■■L p )- 

This element belongs to the conjugacy class which is specified by the partition 
{«!, a 2 , • • • , a p } where oti + a 2 + . . . + a p = n. The fundamental result follows: 

2.1.7. The irreducible representations of the symmetric group S n can be la- 
belled by the partitions of n. 

For example we consider the representation on the n-dimensional vector space 
V of the symmetric group S n defined, as above, by 



p(a)ei = e ai . 
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Introducing the change of basis 

1 sr^n 

z a = 7=== Eli ( e i ~ Vae a +i ) , a= 1,2, ...,n- 1. 
It is clear that spans a one-dimensional invariant subspace 

and we have 

which itself belongs to the span of {z±, z 2 , ■ ■ ■ , 2 n -i} which is consequently a 
complementary invariant space. To prove this consider the standard inner 
product: 

(ei, 6j) := 5ij. 

and show that 

(p(a)z a , zq) = 0, Vcr G 5 n . 

The representation of the symmetric group on the subspace Zq corresponds to 
the partition of n consisting of a single element: {n}. 

Another one-dimensional representation of the symmetric group can be con- 
structed by taking the sign of the permutation 

sgn(a) = ±1, 

with the representation space C. This representation corresponds to the par- 
tition {1,1,..., 1} with l + l + ... + l = n. 



2.1.2 Tensor product 

The dual of the vector space, V, is denoted as V* and defined to be the set of 
linear functionals {/ : V — > C}: 

f(cv) = cf(v), 

/(« + !/) =/(«)+/(!/), 

for all c G C and v,v' G V. Of course V* itself forms a vector space and we 
use the basis £i, £ 2 , • • • , Cn such that £i(e_,-) — Since V and V* are complex 
vector spaces of identical dimension they must be isomorphic and we define 
the linear functional v as 

n 
i=l 
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so that 

i=i 

With these definitions in hand we consider bi-linear functionals on the ordered 
product of two vector spaces V\ and V 2 with bases {e- 1 - 1 } and {e^ 2 ''} respectively. 
Such functionals map V\ x V 2 to C and satisfy 

f(cv 1 ,v 2 ) =cf(y 1 ,v 2 ) = f(vi,cv 2 ), 
f(vi + v[,v 2 ) = f(v 1: v 2 ) + f(y' 1 ,v 2 ), 
f(vi, v 2 + v 2 ) = f(v!,v 2 ) + f(vx, v 2 ), 

for all c e C, vi, v[ e V\ and i>2,i>2 £ V2. Again this set of functionals forms 
a vector space which we denote as (Vi <8> V 2 )* with basis given by the set of 

functionals ^ ® ^j 2 "* defined as 

d 1, < , (4V! 2, ):=&(4W ) ). 

From which it follows that the bi-linear functional / can be written as 

/ = £/y4 (1, ®«f, 

where = f(ef \ ef^)- From this we can induce the definition of the tensor 
product of V\ and V 2 to be the vector space Vi<g>V 2 . A given element ip G Vi®V 2 
is referred to as a tensor and can be expressed uniquely in the form 

This process can be iterated to the tensor product of multiple vector spaces 
H — Vi <g) V 2 <S> ■ ■ ■ <8> V m where a given element ip E H can be expressed as 

^= £ fe,.. m e l ( > e ; 2 2) ®...®et ) . 

The tensor product space satisfies the axioms of a vector space with addition 
and scalar multiplication defined in the obvious way: 

c • -0 = £ cVW-.^e^ ® e 4 (2) <g> . . . <g> , 

il,i2, — ,im 

i> + ¥= £ (V ; ni2...i m + ^ii2..i m )e! 1 1) ® e L 2) ®---® e t ) - 

When one is taking the tensor product of a single vector space we use the 
notation 

V® m := V V <g> . . . <g> V. 
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Again, H := V® m must be isomorphic to H* = (V*)® m and we define 

^= v^...^®^®---®^, 

il,«2,...,im 

so that 

^(v) = ^*ii 2 ...im^w*2...i™- 

11,12,— ,1m 

2.1.3 Group action on a tensor product space 

Given a set of representations of a group 

p a : G-> GL(V a ), a = l,2,...,m, 
it is possible to construct a new representation p by taking the tensor product 

H = V 1 ®V 2 ®...®V m 
and define the tensor product representation on the vector space 7i to act as: 

p(g)tp ■ = Pi(g) ® p2(g) ® • • • ® p m (g)ip, 

= ^.-i m Pi{g)4i ® P2^)e 4 ( 2 2) ® . . . ® p m (^)e^ } . 

ii.ia, ...,im 

In contrast to this we consider another important case which occurs when we 
have the direct (cartesian) product of m groups: 

G = G\ x G 2 x . . . x G m , 

with representations pi, P2, ■ ■ ■ , Pm and associated representation spaces 

Vi,V 2 ,....,V m . 

It is again possible to define a representation p on 7i as 

P(fi')^ = X #2 X ... X g m )lf) = pi(flTi) ® p 2 (fl'2) ® . . . ® Pm(gm)lp- 

For future use we define the notation 

x m G : = GxGx...xG 
® m 5( : = g ® g ® . . . ® g. 

Presently we will recall the character theory of the general linear group to 
enable us to decompose such representations into their irreducible parts. 

1 Interestingly, in quantum physics the appropriate description of a multi-particle system 
is given by taking the tensor product of different representations of a single group, such as 
the orthogonal or Lorentz groups, where the choice of each representation is fixed by the 
individual particle types. Whereas in the case of stochastic models of phylogenetics the 
reverse is the case; the system is described by taking the group action on the tensor product 
space as the direct product of the defining representation of the Markov semigroup. 
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2.2 Irreducible representations of the general 

linear group 

It is well known from group representation theory that the finite-dimensional 
irreducible representations of the general linear and the symmetric group can 
be put into a correspondence. This result is known as the Schur/Weyl duality. 
As we saw above, the irreducible representations of the symmetric group on 
n elements can be labelled by the partitions of n. Additionally, there exist 
algorithms for explicitly constructing these irreducible representations once a 
partition has been specified. Here we will show how the irreducible repre- 
sentations of the general linear group on V occur as subspaces of the tensor 
product space V® m . These projections are constructed using operators known 
as Young 's operators which are computed from the partitions of m. 

2.2.1 Partitions 

A finite sequence of positive integers 

A = {Ai,A 2 ,...} 

with Ai > A 2 > . . ., is an (ordered) partition of the integer n if the weight of 
the partition, 

|A| := Ai + A 2 + ..., 

satisfies |A| — n. 

It is usual to use a notation which indicates the number of times each integer 
occurs as a part: 

A = {...,r m V--,2 m2 ,l mi } 

so that rrii of the parts of A are equal to i. It is useful to represent a given 
partition as a Ferrers diagram by drawing a row of squares for each part 
of the partition, and placing these rows upon each other sequentially such 
that the rows decrease in length down the page. For example the partition 
A = {5, 3 2 , 2, 1} is represented by: 
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Definition 2.2.1. A Young tableau, T, of shape A with |A| = n is an assign- 
ment of the integers 1, 2, . . . , n to a Ferrers diagram such that the rows and 
columns are strictly increasing. A semi-standard tableau, T', requires that only 
the rows need to be increasing. 

For example, the canonical Young tableau of shape {5, 3 2 , 2, 1} is: 



1 


2 


3 


4 


5 


6 


7 


8 




9 


10 


11 




12 


13 






14 









while a semi-standard tableau of the same shape is: 



1 


1 


1 


2 


3 


3 


4 


4 




4 


5 


6 




6 


6 






7 









Definition 2.2.2. The ring of symmetric functions, A n = Z[xi, . . . , x n ] n , is 
the set of polynomials in n independent variables X\,...,x n which are invariant 
under the representation of S n defined by permutations of the variables. 

That is, / is a symmetric function if and only if: 

f(xi, x 2 , • • • , x n ) = f(x al , x a2 , • • • , x an ), Vex G S n . 
It is clear that A n is a graded ring: 

An = ®d>oK 

where A^ C A n consists of the homogeneous symmetric polynomials of degree 
d. Various bases exist for the ring of symmetric functions (see j3Tj). The basis 
which will be of use to us is given by the Schur functions. 

2.2.2 The Schur functions 

For a given partition A define the monomial „ n . Consider the 

polynomial which is obtained by anti-symmetrizing: 

a x = a x (x 1 , ...,x n )= ^ sgn(a)a(x x ), 
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Figure 2.1: Semi-standard tableaux 



where 

oyx ) .— x aX x a2 ■ ■ ■ x an . 
By considering the partition 6 — {n — 1, n — 2, . . . , 1} it follows that 

05 J~J (Xj •'' / ) • 

l<i<j<n 

which is called the Vandermonde determinant. The Schur functions are then 
defined as the quotient 

s\ = s x {x 1 ,x 2 , • • • ,x n ) = a x+s /a s , 

which is clearly symmetric. A more intuitive and constructive way of defining 
the Schur functions is to take: 

T' 

where the summation is over all semi-standard A tableaux T' . For example, 
for A = {2, 1} the semi-standard tableaux are displayed in Figure [27T1 In this 
case each tableau corresponds to a monomial x T to give 

S2l(xi, X2-, X3) = x\x2 + x\x-z + X\x\ + 2X1X2X3 + X1X3 + X2X3 + X2X3 

which is easily seen to be a symmetric polynomial. 



2.2.3 Group characters of GL(n) 

For a given matrix g G GL(n) it is possible to use the Jordan decomposition 
to put it in upper triangular form and hence the character is simply the sum 
of the eigenvalues: 

X(g) = tr(g) = xi + x 2 + • • • + x n . 
This corresponds to the Schur function 

S{i}(Xi, . . . ,X n ) = Xi + x 2 + • • • + x n . 
By considering the tensor product representation of GL(V) on V ® V we have 
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as irreducible subspaces known as the symmetric and anti-symmetric tensors 
with dimensions \n{n + 1) and \n{n — 1) respectively. We have 

where 

so that the decomposition of V ® V under the action of GL(n) into irreducible 
subspaces is given by 

v® v = v {2} ®v {l2} . 

Now suppose we take the group element g 6 GL(n). It follows from an elemen- 
tary calculation that the character of this group element on the representation 
V® 2 is simply the product: 

x(g ® g) = (xi + x 2 + . . . + x n )(xi + x 2 + ... + x n ). 

In terms of the Schur functions it follows that we have the decomposition 

x(g®g) = s{i}(z)s w (x), 

= S {2 }(x) + S {1 2 } (x) 

where 

S{2}(x) = (x\ + X X X 2 + X1X3 + . . . + x\ + X 2 X 3 + . . . + X 2 n ), 

S{i2}(x) = (xix 2 + x x x 3 + . . . + x 2 x 3 + . . . + av_ix n ). 

Thus we see that the decomposition of the tensor product representation into 
irreducible parts can be inferred by using the Schur functions as a basis for the 
ring of symmetric functions. This is the archetypal example from physics and 
leads to the full Schur/Weyl duality which allows us to classify the irreducible 
representations of GL(n) (and its subgroups) by simply using the character 
formulas and the Schur functions. 

2.2.4 The Schur/Weyl duality 

In this section we will construct the Schur/Weyl duality which states that 
the irreducible representations of the general linear group and that of the 
symmetric group can be put into correspondence. 

2.2.3. If V decomposes into the direct sum V = U @W where U and W 
are invariant subspaces under the group representation p, then the projection 
operator P , defined by PV = U , satisfies 

Pp(g) = p(g)P, VgeG, (2.2) 

and similarly for the orthogonal projection (1 — P). Conversely, if P is a 
projection operator satisfying ( dJ| ) then the subspace it projects to is invariant 
under p. 
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Consider the representation of the symmetric group on V® m defined by 

a(e h ® e i2 ® . . . ® e im ) : = e ial ® e ia2 ® . . . ® e iam . 

It should be clear that the action of any such element of the symmetric group 
will commute with the tensor product representation of GL(n). In addition to 
this the algebra generated from this action will commute with GL(n) and hence 
can be used to construct projection operators which satisfy (12.21) . Presently 
we will discuss how to construct such projection operators such that the cor- 
responding invariant subspaces are in fact irreducible. 

Consider a Young tableau with shape A and |A| = m. Consider the per- 
mutations p which interchange the integers in the same row, and, conversely, 
permutations q which interchange numbers in the same column. In the algebra 
of the symmetric group action defined above, consider the quantities 

V 

Q = Y1 s 9n(q)q. 

The Young operator corresponding to the standard tableau T is then defined 
to be 

Y = QP, 

and we have the fundamental result: 

2.2.4. For a given partition X, Y projects onto an irreducible subspace ofV® m 
under the tensor product representation of GL(n). Young tableaux of the same 
shape label equivalent representations. 

Now suppose Y\ is the Young operator corresponding to the partition A. We 
define the subspace 

V x := Y\V® m . 

It is possible to prove that the group character of the tensor product represen- 
tation of the general linear group on the subspace V x is none other than the 
Schur function s\(xi, x 2 , ■ ■ ■ , x n ). 
For example we consider the standard tableau: 



1 


3 


2 





With corresponding Young operator given by 

P = e + (13), 
Q = e-(12), 

Y = e + (13) - (12) - (123). 

We also note that the dimension of the invariant subspaces are given by setting 
the characteristic values in the Schur function equal to the identity: 

dimension of V x = s\(l, !,...,!). 
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2.2.5 More representations 

From this construction we can build more representations such as 

V» <g> V, 

with group character which corresponds to the outer product of two Schur 
functions which is defined as the pointwise product: 

s li s v (x) := s^s^x) = J^c^Sa, 

A 

where |A| = + \u\ and the are integer coefficients which can be deter- 
mined by the Littlewood- Richardson rule [39| PLT]. 
Another way of constructing representations is to consider 

The group character of this representation is given by another type of multi- 
plication of Schur functions known as the plethysm (defined formally in Mac- 
donald [H]). Here we use Young's tableaux to give a constructive definition. 
Recall that we have 

T' 

which is a summation of monomials in x±,X2, ■ ■ ■ , x n . If there are m such 
monomials in s^x) and these are denoted by yi, 1 < % < m, then the plethysm 
is given by 

s\[Sf,}(x) = s x (y) = ^2y r - 

T> 

The plethysm sa[s m ] can be interpreted as giving the character of the rep- 
resentation (U^) A . That is we take U M as the defining representation and 
symmetrize this representation with A. 

Finally the inner product of two Schur functions is defined as 

Sfl {x) * s v (x) = ^7^s A (x), 

A 

where — \v\ — |A| = n and the 7^ are the integer multiplicities of the A 
representation of S n occurring in the decomposition of the tensor product of 
the [a and v representations of S n . The inner product comes into play if we wish 
to compute the character of GL(n) x GL(n') on V x (g) V x with |A| = m, |A'| = 
m! . The character of this representation is S\(x)s\r(y) where x\, X2, ■ ■ ■ ,x n and 
yi,y 2 , ■ ■ ■ ,y' n are the eigenvalues of the relevant group elements in GL(n) and 
GL(n') respectively. The decomposition of characters is given by the formula 

s x (x)s X r(y) = ^2Y\x>Sp(xy), 

p 
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where (xy) = {xxy^x^, . . -,x 2 yi, . . .,x n y n ) |H]. 

We will often write the Schur function s\ simply as {A} and the plethysm will 
sometimes be written as 

In practice we compute Schur multiplications by using the group theory soft- 
ware Schur [63J. For further discussion of Schur functions and their various 
multiplications see [2 QUI OH QH [39] . 

As an example consider the defining representation of GL(n) on the tensor 
product V® m . That is 

ip -> g®g® ...®gi> := <S> m gip, 

for if) G V® m , g G GL(n). The character of this representation is given by 
the pointwise product of m copies of S{i}(x) and can be decomposed into 
irreducible characters by using the Littlewood-Richardson coefficients c\,. In 
the case where m = 4, Schur gives 

S{ 1} (x)s {1} (x)s {1} (x)s{i } (x) = 

S{4}(x) + 3s {31} (s) + 2s {2 2 } (x) + 3s {21 2 } (x) + s {1 4 } (x). 

This tells us that under the action of GL(n) the tensor product V® A decom- 
poses into irreducible subspaces: 

ym = y{A} + 3 y{31} + + 3V ,{212} + ^{1*}^ 

where the multiplicities account for the number of legal standard tableaux for 
each partition. 

2.2.6 One-dimensional representations 

Recall that the dimension of the irreducible representation A is given by 
s\(l, 1, . . . , 1). It follows that the one-dimensional representations occur when 
there is only a single semi-standard tableau with shape A. In the case when V 
is n-dimensional it should be clear that the one-dimensional representations 
occur when we have A = {k n } for some k. 
Consider the character of GL(n) on U^'^: 

S{fc"}0) = {x x x 2 ■ ■■x n ) k , 
= det(g) k . 

Thus for any i] E y{ fcn } we have 

r) i-> det(g) k r), 
under the {k n } representation of GL{n). 
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2.3 Invariant theory 

Given the denning representation of a group G on a vector space V, it is 
possible to define a representation which acts on the vector space of functions 
/ : V -> C as 

gf-^fog' 1 . (2.3) 

(It is necessary to take the inverse of the group element to ensure that the 
induced representation satisfies the properties of a group homomorphism.) 
An invariant with weight k G N is then defined as any function which satisfies 

<r7 = / o g = det(#/. (2.4) 

We will be exclusively interested in the case where / is a polynomial in the 
dual vector space V* with basis elements {£i, £2, • • • , In order to generate 
polynomials in this space multiplication is defined pointwise: 

€a€b(x) ■= €a(x)£ b (x), \<a,b<n. 

The full set of polynomials generated from this construction is denoted as 
C[V]. A homogeneous polynomial satisfies 

/ o cl = c d f, c G C, 

for some positive integer d which is referred as the degree. From elementary 
considerations it follows that C[V] has the structure of a graded algebra over 
the degree: 

C[V] = ® d C[V] d , 

where C[V]d is the set of homogeneous polynomials of degree d. 

By counting the degree of the various algebraic quantities we see that d = nk, 

and we denote 

C[V] G d = {/ G C[V] d \f o g = det(g) k fy 9 G G}. 

Of course we have already studied invariant functions on a finite group! The 
symmetric functions are none other than the set C[V] 5n with k — 0. Another 
example comes from the classical groups which are defined by imposing invari- 
ant functions. For example the orthogonal group 0(n) acting on R n can be 
considered to be defined by the invariant function 

n 

i=i 

Consider the tensor product space C 2 <8> C 2 with associated group action 
GL(2) x GL(2). The following relation holds for any ip = J2ij^ij e i ® e j e 
C 2 ®C 2 : 

- ^12^21) = det ^(^11^22 - ^12^21), 

where ip' := g\ ® g 2 ip and detg = det g\ det g 2 . So (^11^22 — ^12^21) is an 
invariant. 
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2.3.1 Invariants as irreducible representations 

In this section we will show that the group invariant polynomials C[V® m ]^ 
occur exactly as the one-dimensional representations V^ k "^ in the decomposi- 
tion of (y® m ^ d ^ with md = kn. As a first step consider a vector space U. We 
establish the vector space isomorphism: 



jj{d} qc/] 



a- 



This follows by observing that if U has basis {ui}, then consists of all 
tensors of the form 



^= E ^ni*-id u h ® u i2 ® • • • ® U id , 



11,12, —,ld 



where VW-id i s invariant under permutations of indices. Now if U* has basis 
{Ci}, consider an arbitrary element of Cff/j^: 

/ = ^ ] fiii2---idChCi2--Cid- 
h,i2,---,id 

Clearly fi^../ lA is also invariant under permutations of indices. This identi- 
fication establishes the isomorphism. We define the canonical isomorphism 
uj : UW -> C[U] d as 



; (^) ^ ] V'ii^-.-idCiiC^ • • • Ci<j) 

with inverse 



U,»2, 



(2-5) 



1 (/)= £ 



a,' ( / ) = > Ji 1 i 2 ...i d U il ®U i2 <$...<8 U id . 
I\,l2, — ,ld 

By explicit computation 

^((g^ty) = o = ^(VO, (2.6) 

and 

uTV7) = aT 1 */ o ^) = ®V" _1 (/) (2.7) 
for all 5 G GL(U), f G C[C/] d and ^ G f/ w . 

From these considerations we generalize to the case where U = V® m and 
establish the main result of this section: 

Theorem 2.3.1. Consider integers m,d,k,n with md = kn and label the 
occurrences of in the decomposition of ("l/® m )W by an integer a. It 

follows that 



C[V® m f d L{n) 2* ®aV a {k " } . 
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Proof. Suppose / E C[V^ m ]^ L(n) . We have 

W" V 1 /) = u-\det(g) k f) = det^u^f) 

= ®Vw _1 (/), 

and hence the representation space span[oj~ 1 {f)\ provides a one- dimensional 
representation of GL(n). Conversely, suppose that spanfy] with ip e (y® m )W 
provides a one-dimensional representation of GL(n) such that 

® d gi/; = det(#)V 

Noting that det(g) = det(<7*) it follows that 

u{® d gil)) = oj(det(g) k ip) = det (g) k uj(ip) 

= u(® d g t 4>)=g- 1 L;(ij), 

so we can conclude that u(ip) G C[V® m }j L ^ . □ 



2.3.2 Using Schur functions to count invariants 

By the preceding theorems we conclude the following: 

Theorem 2.3.2. The number of invariants in Q^/® ml ^ L ^ of weight k is equal 
to the number of occurrences of {k n } in the decomposition of (x m {l}) lg) W. 

We now consider the character of x m GL(n) on (y^ m ) x : 
s x (x^x^ ...x™) 

E<7 A <y Ul V m - 2 s (x {1) )s (x {2) ) s (x {m) ) ( 2 -8) 



!/l,...,l/ m _l 



where (x^x^ . . .x^) = {x^/xf^ . . .x\™ )i<i a < n - Now each term in ( 12. 8ft is 
an irreducible character 



s ai {x ^ ^ ) S<T2 {x ^ ^ ) • • • S<j m (x^ ^ ) , 

with | = |A| and multiplicity 

a 2 , . . . , a m ; A) := ^ 7^t£U ■ ■ ■ 7^1^ • 

H,...,I/ m _2 

From the definition of the inner product 

q(ci, o~2, ■ ■ ■ , c m ; A) = {multiplicity of A m <7i * <7 2 . . . * cr m }- 

The dimension of each of the irreducible representations ( 12.91) is equal to the 
product of the dimensions of each component irreducible representations. To 
identify invariant functions we are led to the following theorem: 

Theorem 2.3.3. The number of weight k invariants in C[V® m ]f GL{n) is 
equal to the number of occurrences of the Schur function {d} in * m {k n }. 
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2.4 Invariants of the general linear group 

We have established that any one-dimensional representation of GL(n) occurs 
as a partition of the form {k n }. This is because the columns of the partitions 
correspond to the anti-symmetrization process of Young operators and it is 
clear that if we anti-symmetrize n elements n times then there will only be 
a single independent element remaining. Presently we will present a generic 
scheme which allows us to generate the exact polynomial form of these repre- 
sentations. 

Consider the definition of the determinant of a matrix g: 
det(gr) = ^2 s 9n((r)g la ( 1 )g 2 a(2) ■ ■ ■9wr{n)- 

By defining the (anti-symmetric) Levi-Civita tensor 

il,*2).")*n 

where €a(i)a(2)...a(n) '■= s 9 n ( a )i it follows that the determinant can be expressed 
as 

det(g) — — ^ ] 9i\j\9i%ji ■ ■ ■ 9injn e hi2—i n ^jij2---jn- ^ 9) 

•li»2> ■■■, i n, 
jl J2, — Jn 

Presently we will show that 

^ :— g ® g ® . . . ® ge — det(g)e, 
for all matrices g. In components we have 

t i 1 i 2 ..;in = 9iljl9i232 ■ ■ ■ 9injn e jlj2--jn> 

jl,j2,-Jn 

and it is clear that e^ li2 in is completely anti-symmetric under interchange of 
indices and hence must be proportional to e il j 2 ... in . Finally we use (12. 9p to 
conclude that 

e' = det(#)e. 

Theorem 2.4.1. Consider a function f : V® n x V® m — > C which satisfies the 
conditions: 

1. For fixed X E V® n we have f E C[V® m ] d . That is /(%, q/>) = c d f( X ,ip). 

2. For fixed ip E V® m we have f E C[V 0n ] fc . That is f(cx,ip) = c k f(x^)- 

3. /fc® m #) = /(®Vx,'/'). 
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The function f e : V® m — ► C given by 

/ e W:=/(e,V) 
t/ien satisfies f t {® m g'4 ) ) = det(g) k f e (ip) . 

Proof. We have 

/ £ (® m ^) = /(e,®" 1 ^) = /(®Ve,V0 = /(det(^)6,V) = detfo) fc /M) 

= det(g) k f t (ij). 

□ 

This theorem gives us some idea of how to explicitly construct invariants for 
the general linear group. The rest of this chapter will be devoted to the 
illustration of several examples. 



2.4.1 Invariants of GL(n) on V® m 

For this case the number of invariants of GL(n) on V® m is given by the mul- 
tiplicity of {k n } in (x m {l}) lg) 'f^ with nk = md. Here we will consider m = 2 
and the cases n — 2,3, 4. 



The case of GL(2) 

In the case that n = m = 2, the possible degrees of the invariants are 

(2=1,2,3,4,... 

and using Schur we find 

({1} x {!})•« 3 {I 2 }, 



({1} x {1})®< 2 > 3 2{2 2 }, 
({1} x {1})^ 3 > 3 2{3 2 }, 
({1} x 3 3{4 2 }, 

({l}x{l}f } 9 3{5 2 }, 
({1} x {l}p9 4{6 2 }. 

At each degree the correct number of invariants can be built from 

hW X^i^ 6 ^ = (^12 -^21), 

ixM (2.10) 

/a(^) : = X] ^ii 2 ^-ii2 e uii e i2j2 = (^11^22-^11^22), 

and are non-zero, algebraically independent, and by inspection satisfy (12.4.11) . 
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The case of GL(3) 

In the case that n = 3, m = 2, the possible degrees of invariants are 

d = 3,6,9,12,... 

Computing plethysms in Schur gives 

({1} x {l}f W 3 2{2 3 }, 
({1} x {1})^ 6 > 3 3{4 3 }, 
({1} x {l}p9 4{6 3 }. 

At each degree the correct number of invariants can be built from the two 
d = 3 invariants: 

h,i2,ji,j2,ki,k2 
= -^13^22^31 + ^12^23^31 + ^13^21^32 

- ^11^23^32 - ^12^21^33 + ^11^22^33, 

1&2 ^i\i2il^3lk\k2 

h,i2,ji,h,kiM (2-11) 

= ^?3^22 - ^12^13^23 ~ ^13^21^23 + ^11^23 

- 2^13^22^31 + 3^12^23^31 ~ ^21^23^31 + ^22^31 

- ^12^13^32 + 3^13^21^32 ~ 2^11^23^32 ~ ^12^31^32 

- ^21^31^32 + ^11^32 + ^12^33 

- 2^12^21^33+^21^33, 

which are non-zero, linearly independent and satisfy (12.4.11) . 
The case of GL(A) 

In the case that n = 4, m = 2, the possible degrees of the invariants are 

d= 2,4,6,... 

and Schur gives 

({1} x 3 {l 4 }, 

({1} x {1})°W 3 3{2 4 }, 
({1} x {1})°W 3 3{3 4 }, 
({1} x {1})®< 8 > 3 6{4 4 }. 
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The correct number of invariants can be constructed from three invariants of 
degree d = 2,4,4 respectively: 



1^23132 ) 

W i*2 ,31 ,32 



h{^) ■ - 22 i ) ili2i ) hh' l l ) k 1 k2^lil2 e ii3lkili e i232k2l2^ (2.12) 

ii,i2,ji,32,ki,k2,h,h 

il2^iii2jiki^j2k2hh i 

h,i2,ji,j2,ki,k2,h,h 

which by explicit expansion (either by hand or using a computer algebra pack- 
age) are non-zero, algebraically independent and satisfy (12.4. ip PI 



2.4.2 Invariants of x m GL(n) on V® m 

We consider the existence of invariants q : V® m — > C which take the form 

q(gx) = det(g) k q(x), 

for all g — g\ ® g 2 ® . . . ® g m with g a e GL(n) for 1 < a < m. We mimic 
the construction of the previous section and give sufficient conditions for the 
existence of such functions. 

Theorem 2.4.2. Consider a function q : (x m V® n ) x V® m — > C which satisfies 
the conditions: 

1. For fixed ip e V® m we have 

q(Xh • • • , CXa, ■ ■ ■ , Xm) = ck V(Xl, ■■■,Xa,---,Xm,'4>), 

for each 1 < a < m. 

2. For fixed Xa £ V® n , 1 < a < m, we have 

q(Xh ■ ■ ■ , Xm] of)) = c d q(xi, ■ ■ ■ , Xm, V0- 

3. For all g = g\ <g> g 2 ® . . . <8> g m we have 

q(xi, ■ ■ ■ , Xm] gi/>) = q(® n gixi, • • • , ® n g t m Xm\ V0- 

The function q e : V® m — > C given by 

9e(V0 :=q(e,e,...,e;i)), 
satisfies q e {gij)) = det (g) k f e (ip) for all g = g x ® g 2 ® ■ ■ ■ ® g m - 



2 As the number of indices in these expressions is becoming prohibitively large, we will 
adopt a convention from now until the end of the thesis that, unless otherwise indicated, 
any indices that appear after a summation sign are to be summed over appropriate bounds. 
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Proof. We have 

Qeigtp) = g(e, • • • , e; gip) 

= q(® n g t 1 e,...,® n g t m e;^) 

= q(det(g 1 )e, det(g m )e; ip) 

= det( gi ) k det(g 2 ) k . . . det(g m ) k q(e, . . . , e; V) 

= det{gfq^). 

□ 

With these sufficient conditions in mind we will use the Schur functions to 
ascertain existence of these invariants and give examples of their exact form. 



The case of k = 1 

From (I2.3.3P the existence of such invariants requires that for the m-fold inner 
product we have: 

{l n }*{l n }*...*{r} 3 {n}. 
Now for even m we have 

{l n }*{l n }*...*{r} = {n} 

and for odd m 

{l n }*{l"}*...*{r} = {l n }. 

So that there exists a single invariant for each even m and no invariants for 
odd m. 

For m = 2 and n = 2, 3, 4 these invariants are 
det 2 (» = ^^hi2^hh e hh e i^ 

det 3 (» = ^^ixh^hh^hx fc 2 e «lilfcl e i 2 j 2 k 2 l 

(2.13) 

det 4 (V') = y^ J 4>i 1 i 2 4 , jij2' l l J k 1 k 2 4>i 1 i 2 £i 1 j 1 k 1 i 1 £i 2 j 2 k2i2> 

which can be seen to satisfy (I2.4.2P and can be generalized in the obvious 
manner for any n. (These polynomials should be distinguished from the de- 
terminant of a matrix; although their functional form is identical to that of 
the determinant, they arise as invariant functions on the linear space V <S> V.) 
For m = 4 and n = 2,3,4 we can define: 

^2(^0 — ^ ] ' l Piii2i3iA } Pjlj2jzjA^iih^i2j2^i3j3^iA,jAi 

il«2«3«4 r ] hhhH^kl^kzk^-ixjxkx ^i 2 j 2 k2^izjzkz^-iA3AkAi 

(2.14) 

ili 2 i3i4^jlj2j3j4^klk2k3k4' i Phl 2 l3l4^iljlkili^i 2 j 2 k2l 2 ^i3j3ksh^i4j4k4l4 , l 

which can also be seen to satisfy (I2.4.2P and can be generalized in the obvious 
way for arbitrary n. We refer to these invariants as quangles. 
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The case of x m GL(2) and k — 2 
For m = 2, 3, 4 Schur shows that 

{2 2 } * {2 2 } 3 {4}, 

{2 2 } * {2 2 } * {2 2 } 3 {4}, 

{2 2 } * {2 2 } * {2 2 } * {2 2 } 3 3{4}. 

At m = 2 the required invariant is the pointwise product of det2 with itself. 
Whereas at m = 3 we have the tangl^ 

At m — 4, the pointwise product of Q2 with itself forms a = 2 invariant and 
we have the additional invariants: 

li2i3i4' l Pjlj2j3j4' , Pklk2k3k4' , Pllhl3l4^iljl^i2j2^klh^k2l2^i3k3 e Mfe4 e i3^3 e j4'4 5 
^2 := ^ ] ^iii2hH^jij2j3j4^k 1 k2k 3 k4^l 1 l 2 l3l4 e hji e i2h e i3h e Hk4 e j2k2 e j3k3 e j4U e k 1 l 1 , 

which satisfy (12.4.21) and can be shown to be non-zero and algebraically inde- 
pendent. 

The case of GL(3) xm and k = 2 
For m = 2, 3, 4 Schur shows that 

{2 3 }*{2 3 } 3 {6}, 
{2 3 }*{2 3 }*{2 3 } 3 {6}, 
{2 3 } * {2 3 } * {2 3 } * {2 3 } 3 4{6}. 

At m = 2, the pointwise product of det3 with itself forms a k = 2 invariant 
and at m = 3 the tangle can be generalized to the n = 3 case: 

•^(VO = ^ ] ^i\i2h^3ihh^k 1 k2k 3 ^l 1 l2l3^rn 1 m2m^n 1 n2n i 
■ e «ijifci e j2fc2^2 e fc3^3"i3 e /iminiCm2n2i2 e n3«3i3' 

which by explicit expansion can be shown to be non-zero. 
The invariants at m = 4 remain uninvestigated. 

3 The tangle is known and used in physics to analyse multiparticle entanglement in quan- 
tum mechanics. This will be reviewed in Chapter [3] 
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The case of x m GL(4) and k = 2 



For m = 2, 3, 4 Schur shows that 



{2 4 } * {2 4 } 9 {8}, 
{2 4 } * {2 4 } * {2 4 } 3 {8}, 
{2 4 } * {2 4 } * {2 4 } * {2 4 } 3 7{8}. 



At m = 2 the pointwise product of det4 with itself is a k = 2 invariant and at 
m = 3 the tangle can again be generalized: 



In this chapter we have given a review of the use of the character theory to 
build the irreducible representations of the general linear group. We have 
demonstrated the concrete connection between the one-dimensional represen- 
tations and the classical invariants, and have presented theorems that allow 
us to count these invariants at given degree d and weight k. 




(2.17) 



■e. 



and shown to be non-zero by explicit expansion. 
The invariants at m = 4 remain uninvestigated. 



2.5 Closing remarks 



Chapter 3 
Entanglement and phylogenetics 



Stochastic methods that model character distributions in aligned sequences are 
part of the standard armoury of phylogenetic analysis [191 HH EH EI] • The 
evolutionary relationships are usually represented as a bifurcating tree directed 
in time. It is remarkable that there is a strong conceptual and mathematical 
analogy between the construction of phylogenetic trees using stochastic meth- 
ods, and the process of scattering in particle physics [21] • It is the purpose of 
the present chapter to show that there is much potential in taking an algebraic, 
group theoretical approach to the problem where the inherent symmetries of 
the system can be fully appreciated and utilized. 

Entanglement is of considerable interest in physics and there has been much 
effort to elucidate the nature of this curious physical phenomenon [HI HH [2HI 
[38j [62] • Entanglement has its origin in the manner in which the state probabil- 
ities of a quantum mechanical system must be constructed from the individual 
state probabilities of its various subsystems. Whenever there are global con- 
served quantities, such as spin, there exist entangled states where the choice 
of measurement of one subsystem can affect the measurement outcome of an- 
other subsystem no matter how spatially separated the two subsystems are. 
This curious physical property is represented mathematically by nonseparable 
tensor states. Remarkably, if the pattern frequencies of phylogenetic analysis 
are interpreted in a tensor framework it is possible to show that the branching 
process itself introduces entanglement into the state. In the context of phylo- 
genetics this element of entanglement corresponds to nothing other than that 
of phylogenetic relation. This is a mathematical curiosity that can be studied 
using methods from quantum physics. This is a novel way of approaching 
phylogenetic analysis which has not been explored before. 
This chapter will begin by establishing the formalism of quantum mechanics 
and introducing the concept of entanglement through an elementary example. 
A short review of the use of group invariant functions to analyse entanglement 
will be presented. The stochastic model of a phylogenetic tree will then be 
developed in its standard form, followed by a discussion which establishes a 
presentation of this model in the form of a group action on a tensor product 
space as used in quantum mechanics. The invariant functions used to study 
entanglement will then be examined in the context of phylogenetic trees. 
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Note: Elements of this chapter are extracted from [59 



3.1 Quantum mechanics 

The formalism of the quantum mechanical description of physical systems 
amounts to four fundamental postulates. 

Postulate 1. The mathematical description of any physical system occurs as 
a state vector ip in a complex vector space, V, together with an inner product 
known Hilbert space H = (V, .). 

For a given physical system it is not a priori apparent exactly how the Hilbert 
space should be chosen. As will be elaborated later, a basic property of quan- 
tum mechanics is that it is not possible to determine (in practice or in prin- 
ciple) the exact and complete configuration of a physical system. Thus, the 
Hilbert space is chosen not to represent all possible configurations of the sys- 
tem, but rather to represent whichever part is observable and under consid- 
eration in a given experimental setup. For example the full description of an 
electron is given by the tensor product of the representation space of the spin, 
C 2 , with that of the representation space of spatial position, square integrable 
functions {/ : R 3 — > C}. However, one is often only interested in the spin 
degrees of freedom of the system and simply ignores the position component 
of the state vector. 

For our purposes it will be enough to consider only the case where Ji is the 
finite dimensional vector space with inner product given in terms of notation 
from Chapter [5] as 

(i/j,(p) = ip((p). 

Postulate 2. The dynamical evolution of any physical system is governed by 
the linear equation 

where h is Planck's constant and H(t) is a Hermitian operator: 

(,/>,H(p) = (H1>,<p), 

known as the Hamiltonian. Completely equivalently, the dynamical evolution 
is described by solutions of ( 13. lft : 

ip(h) = u(t 2 ,t 1 )4>(t 1 ), 

where U(t2,ti) is a unitary operator 



{Uil>,Utp) = {il>,ip). 
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From this postulate it is not apparent how the Hamiltonian should be chosen 
in any particular case. Historically, Dirac formalized the idea of classical 
analogy where the Hamiltonian is interpreted as the total energy of the system 
|13j . However, this procedure is limited to systems which have a classical 
counterpart and the general case is left to the modern quantum physicist. 

Postulate 3. An observable of a physical system is described by an Hermitian 
operator A with associated eigenvalues {ati, a 2 , ■ ■ ■} and eigenspaces defined by 
the projection operators {Pi, P 2 , . . .}. If the state vector before measurement 
is then the probability of the result q.{ is given by 

and the state after measurement is ip' = Piip. 

From this definition it is apparent that U must be unitary to preserve total 
probability. We will follow the standard procedure of normalizing the state 
vector: 

(<M) = i- 

Postulate 4. The state space of a composite of m quantum systems with 
individual state spaces TCi,H.2, ■ ■ ■ , 7~Lm is given by the tensor product: 

n = h x ® n 2 ® . . . ® n m . 

From this definition it may seem that the state vector of a composite system 
should be expressed as the product state 

^ = ^W ®^( 2 ) <g, ...®^( m ), (3.2) 

where tp^ E 7i a is the state vector of each individual system. However, for 
the general case, there are physical reasons why there must exist states which 
cannot be written in the form ( 13.21) . We will explore these states and their 
curious properties in the next section. 

3.1.1 Spin I and entanglement 

One way to proceed in the search for the appropriate state space 7i is to study 
the representation spaces of the irreducible representations of a symmetry 
group of a physical system. For the case of three dimensional Euclidean space, 
consider the symmetry group of proper rotations; the special orthogonal group 
SO (3). The irreducible representations of £0(3) are labelled by the spin 
quantum numbers s = {0, |, 1, |, 2, |, . . .} (see 02])- Here we will study the 
case s = I where the representation is two-dimensional: Ti = C 2 , and a state 
vector is referred to as a qubit. The physics of the spin of a qubit is captured 
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by considering an orthonormal basis for C 2 as {z + ,z^} and introducing the 
observable S z satisfying 

S Z Z + = jZ+, S Z Z- = —t;Z_, 

so that the states ip + := z + and ip~ := Z- are eigenvectors of the spin operator. 
Analogously, we can define the x basis {x + , x_} (or any other orthonormal ba- 
sis) by rotating the z basis using the group element of the two-dimensional 
representation of SO (3) which corresponds to the appropriate physical rota- 
tion. In particular, we have 

x + = 75^+ + x- = ^(x+-x-). 

The measurement operators are then defined as being the projection operators 
onto the appropriate basis vectors. For instance the projection operators for 
spin in the z direction satisfy 

P+z + = z+, P+z_ = 0, P~z+ = o, P;z_=z_. 

A generic qubit can be written as 

Introduce the random variables A z e {+1,-1} to correspond to the value of 
the spin along the z axis, and we have 

F(A z = l) = (^P^) = \ Cl \ 2 , 

and 

P(A 2 = -l) = (^,P 2 » = |c 2 | 2 . 

Now we turn our attention to composite states of m qubits where the state 
space becomes 

H = (C 2 f m . 

The most general state can be expressed as 

^ = 5^»li2...im e il ® e i2 ® • • • ® e im , 

so that the state is specified by 2 m complex numbers ipi^...^. In the case 
where ip can be expressed in the form of a product state, we have 

/ (1) (2) (m) 

and we see that the state is specified by 2m complex numbers. The difference 
in these parameter counts between the general state and the product state is 
the origin of entanglement. 

To illustrate the simplest example of entanglement consider the case of a spin 
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zero particle splitting into two spin ^ qubits labelled as A and B. To ensure 
that the total spin is zero, it must be the case that the total state is 

ip = -j^{z+ <8> -2- - Z- <g> Z+), 

which ensures that (S z ® 1 + 1 <8> S z )ip = 0. We introduce the random variables 
A z for particle A and B z for particle B. For the state, ip, the measurement of 
spins of A and B along the z axis is associated with the probabilities 

¥(A Z = 1,B Z = 1)= F(A Z = -1, B z = -1) = 0, 
F(A Z = 1,B Z = -1)= F(A Z = -l,B z = l) = 1/2, 

¥(A Z = 1) = P(A Z = -1) = 1/2, 

and 

F(B Z = 1) = P(5 2 = -1) = 1/2. 

Now if we consider the same state but with spin measurements taken along 
the x axis, it is a simple exercise to show that 

1p = ^( x + ® X- — X- ® £+). 

Now if we were to go ahead and compute the various probabilities associated 
with the observable S x we would come to the same probabilities as above. 
That is, the spins of A and B are always opposite to give A x — \ : B X — — 1 
with probability | and A x = —1,B = +1 with probability |. One can go 
further and show that this is true for any orthonormal basis of C 2 . This 
implies that no matter which axis the spins are measured along, the outcome 
at A is always the negative of the outcome at B. These probabilities have 
been amply confirmed by experiment. 

A problem arises if one wishes to interpret the probabilities of the formalism 
of quantum mechanics as representing our ignorance of the full state of the 
physical system. Such a description of these events would require that at the 
moment of splitting, each particle actually carries the requisite information as 
how to respond to a spin measurement on an arbitrary axis, and somehow this 
information is unobservable or hidden from us. This additional information 
over and above the state vector was historically coined the hidden variables. 
However, Bell showed that it is actually impossible to specify the required 
hidden variables [7] and thus it is not possible to interpret the probabilities as 
simply representing our ignorance of the system. This implies that quantum 
mechanics requires that the physical world is probabilistic in an intrinsic way. 
An alternative way out of this predicament is to assume that there is a non- 
local communication between particles A and B, which ensures that spins are 
opposite along any axis. However, at the moment of measurement, A and B 
could be separated by a very large distance! Thus the entanglement leads us 
to the dilemma of having to accept one of the following: 
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• Quantum systems have an essentially non-local property. 

• The probabilities in quantum mechanics do not just indicate our igno- 
rance of the configuration of a physical system, but are an essential part 
of physical reality. 



Einstein was unhappy with both options, and never made his peace with the 
quantum theory that he was so instrumental in constructing. This is because 
the first violates the spirit, if not the detail of special relativity grossly, and 
the second implies that Einstein's contention that "God does not play dice" 
cannot be true. 

Recall that the conditional probability that the random variable A = x given 
that B = y is defined to be 

HA = x.B = y) 
¥(A = x B = y) := V — - ' — ^. 

The random variables A and B are said to be stochastically independent [18] 
if and only if 

¥(A = x ,B = y) = W(A = x)¥(B = y), 
from which it would follow that 



¥(A = x\B = y)= F(A = x), 

which motivates the definition. (This notion of stochastic independence can 
be extended to multiple random variables. For details see Feller [18].) 
In quantum mechanics, stochastic independence is implied if the state is a 
product ip = (p^ <8> y?^. For if the state is a product state, we have 

W(A = i,B=j)= ^W(P^ (1) )^2y(P iV ? (2) ) 
:= F(A = i)F(B = j). 

In what follows we will equate entanglement with this notion of stochastic 
dependence. 



3.1.2 Orbit classes and invariants 



We have seen that a quantum system exhibits entanglement if the state vector 
cannot be written as a product. Mathematically one would like to partition 
the set of entangled state vectors into equivalence classes which capture the 
essential property of entanglement. A systematic approach to the classifica- 
tion problem is to study the orbit classes of the tensor product space under a 
group action which is designed to preserve the essential non-local properties 
of entanglement. The orbit of an element ip G 7i under the group action G is 
defined as the set of elements {ip' = gip for some g G G}. 
In quantum physics the appropriate group action is known to be the set of 
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SLOCC operators, (Stochastic Local Operations with Classical Communi- 
cation) [TU EH [38j HHJ H5]. Mathematically SLOCC operators correspond 
to the ability to transform the individual parts of the tensor product space 
Ti = Tii ® Tii ® ■ • • ® Ti m with arbitrary invertible, linear operations. These 
operators are expressed by group elements of the form 



where m is the number of individual spaces making up the tensor product, 
and gi G GL(Tii). 

The task is to identify the orbit classes of a given tensor product space under 
the general set of SLOCC operators. A powerful tool in this analysis is the 
construction of the invariant functions C[H] G . By definition these invariants 
are relatively constant up to the determinant upon each orbit class of Ti. It 
can be shown that there exists (under the action of the general linear group 
at least), a finite set of elements which generate the full set of invariants on a 
given linear space. It can also be shown that the set of orbit classes of a given 
linear space can be completely classified given a full set of invariants on that 
space [4"6] . 

In what follows we study the orbit class problem for the state space of two 
qubits and then that of three qubits. 

3.1.3 Two qubits and the concurrence 

Using the notation of Chapter [21 the concurrence is defined using (12. 13ft : 



We wish to construct the orbit classes of Ti = C 2 <8> C 2 under the group 
action GL(C 2 ) x GL(C 2 ). Any state ip G Ti can be expressed using the four 
parameters ipi^ which in turn can be arranged as a matrix M = [ipi^]- Under 
the group transformation 



the corresponding matrix transformation is 

M -> g x Mg\. 

Hence we can answer the orbit class problem by taking a canonical 2x2 matrix 
X and considering the set of matrices {M = AXB; A,Be GL(C 2 )}. 

Theorem 3.1.1. The vector space V ®V where V = C 2 has three orbits under 
the group action GL(2) x GL(2). Under the identification M = [ipi^] for all 
if; the orbits are characterized by the following canonical forms: 



9 = 9i ® 92 ® • • • ® 9: 



C = det 



so that 




V> 9i ® 92^i 
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(i) Null-orbit X = 

(ii) Separable- orbit Y - 
(Hi) Entangled- orbit Z 








1 



1 

1 



The separable and entangled- orbits can be distinguished by the determinant 
function. 

Proof, (i) The null-orbit has only one member, the null vector; it is of course 
unchanged by the group action. 

(ii) We are required to show that the set of 2 x 2 matrices M. = {S : S = 
AYB; A, B e GL(V)} is all matrices such that det(S') = 0. We begin by 

taking a general member of A4, S = ( a ^ J with ad — be = 0. Clearly the 



matrices 



S' : = ( ° I) S = ( ° d ) , S" : = S [ r ; ] = [ " ] , and 



10 V a b 



am . = | 01 )5-' 01 ^ f d C 









f b a \ 


: 


I 


1 = 1 


^d c j 



I J \l J \b a 

also belong to A4. So without loss of generality we can take a ^ and it is 
an easy computation to show that 

so that M. is the set of 2 x 2 matrices with vanishing determinant. 

(iii) Clearly any 2x2 matrix N with non-zero determinant can be written as 

N = AZB where A, B 6 GL(C 2 ). □ 

Corollary 3.1.2. The orbits of H = C 2 <8> C 2 under SL(C 2 ) x SL(C 2 ) are 
labelled by the determinant function det[4>(h)]. 

For further discussion see [SI [HI [3S] • 



3.1.4 Three qubits and the tangle 

It is known that there are six orbit classes of C 2 <8> C 2 <S> C 2 under the action 
GL(C 2 ) x GL(C 2 ) x GL(C 2 ). These orbits classes can be distinguished by 
functions of the concurrence and another relative invariant known as the tangle 
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HUES!. 

We begin by denning three partial concurrence operations as 



(3.3) 



From these definitions it is easy to see that 



Ci(V0 ■■ = C^gt ® g 2 ® g-}^) 

= [det(g 2 ) det(^ 3 )]^i <g> giC^if)), 



with similar expressions for C 2 and C3. 
The tangle is an invariant satisfying 



T(if>) = [detfo) det^) det(</ 3 )] 



2 T(^) 



and from (12.151) can be written in the form 



Ql a 2 13 r'bl 63 r'ci C2 C3 V'rfl cfe (fe ^< 



a\ b\ ^0262 ^cidi ^C2(i2 ^&3C3 ^(13^3 " 



The six orbit classes are described by the completely disentangled states 



the partially entangled states which form three orbit classes characterized by 
the separability of the canonical tensors 



the completely entangled states equivalent to the GHZ state 

ipghz = ^(e ® e <S> e + e x <g> e x ® e x ); 

and the completely entangled states equivalent to the 1U state 

= 75( e o <S> eo ® ei + e <S> e x <g> e + e x <g> e ® e ). 

The tangle and the concurrence and its partial counterparts can be used to 
fully distinguish these orbit classes. For the completely disentangled tensors 
we have 



ifj =l p^ (g> <£.( 2 ) (gi ip 



(3) 




= ^2 ei ® e J ® efc ' 
^ 3) =l>g 2)< 4 3) ei®e,-®e fe ; 



c (v) = o, r(v) = o, 
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for all a = 1, 2, 3. Whereas for the first partially entangled state we have 

dC^^O, Wf ) )=W) = 0, T(^)) = 0, 

and similar relations for the remaining two partially entangled states. 
States in the GHZ orbit satisfy 

C a (i>ghz) o, T(l/> ghz ) + o 

for all a = 1, 2, 3. Whereas states on the orbit satisfy 

C a (^ w ) + 0, T(^) = 

for all a = 1, 2, 3. 

Notice that the GHZ and IU orbits characterize different classes of three qubit 
entanglement. In the GHZ orbit each qubit is entangled with the other two 
qubits and the three qubits are entangled as a triplet. In the W orbit the 
qubits are entangled as pairs but are not entangled as a triplet. 



3.2 Stochastic evolution of biomolecular 

units 

It is standard to model sequence evolution as a stochastic process. A discrete 
set K is associated with biomolecular units which we refer to as bases and define 
n := \K\. For example, in the case of DNA sequences made up of the four 
nucleotides adenine, cytosine, guanine, thymine, we have K = {A, G, C, T} 
and n — 4. The instance of a particular base in the sequence is equated 
with the time dependent random variable X(t) G /C and the stochastic time 
evolution is modelled as a continuous time Markov chain (CTMC) so that 

j/{X{t) =i) = J2%(t)nx(t) = j), i,j e JC. (3 .4) 

3 

The qijit) are called rate parameters and must satisfy the relations 

q lj (t) > 0, Vz ^ j; qu(t) = (3 5) 

Define Q(t) = [Qij(t)]aj e )c) as ^ ne ra ^ e ma t r ^ x associated with the Markov 
chain. The Markov chain is called homogeneous if the rate matrix is time 
independent. The results presented in this thesis are equally valid for inhomo- 
geneous models where the rate matrix is time dependent and so we allow for 
this generality throughout. It is also common to impose further symmetries 
upon the rate matrix such as the Jukes Cantor and Kimura 3ST models |44j . 
However, the results presented here are again valid for any rate matrix satis- 
fying (13.51) . and hence no restriction upon the rate parameters is made. This 
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model is referred to as the general Markov model pQ . 

For notational simplicity we will write TTi(t) := ¥(X(t) = i) and, given an 
initial distribution 7Tj(0), write solutions of (13. 4p as 

= 5^w i3 -(t,s)7r i (s), < s < t; 

where rriij(t,s) := ¥(X(t) = i\X(s) = j) are the transition probabilities of 
the chain. We define the matrix M(t,s) = [mij(t, s)]^ j eK ^ such that in the 
homogeneous case the transition probabilities only depend on the difference 
(t — s) and can be represented in terms of the rate matrix as 

M(t, s) = M(t -s,0) = e Q ^ := g^L^g , 

n=0 

In the inhomogeneous case there are several representations available for the 
matrix of transition probabilities (for details see [291 [50]). The representation 
that is of most use to us here is the time-ordered product: 

M(t,s) = Texp f Q(u)du (3.6) 

J s 

(see for example [20] for the definition of the time-ordering operator T.) For 
sufficiently small St, we can write this in the approximate form 

M(t, s) ~ M(t, t-St)... M(s + 2St, s + St)M(s + St, s) 

— e Q(t-St)5t Q(t-2St)St e Q(s+8t)8t Q(s)St 

From these solutions it is clear that 

det[M(t, s)] = exp f tr[Q{u)\du. (3.7) 

J s 

A more fundamental way to define the transition matrices of a CTMC is to 
impose the backward and forward Kolmogorov equations [2H] : 

dM(t,s) , 
1 ; = -M(t,s)Q{s), 

as (o o\ 

dM(t,s) , K ' 

^ ' =Q{t)M{t,s). 



3.3 Phylogenetic trees 

The remaining task is to model the case of phylogenetically related molecular 
sequences evolving under a stochastic process. Effectively the model consists 
of multiple copies of the random variable X(t) taken as a generalization (via 
a tree structure) of a cartesian product and then modelled collectively as a 
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Figure 3.1: Phylogenetic tree of four taxa 



CTMC The reader is referred to [53] for a more extended discussion of the 
model. Here we keep the presentation to a minimum while allowing for the 
introduction of some essential notation and concepts. 

A tree, T, is a connected graph without cycles and consists of a set of vertices, 
V, and edges, E. Vertices of degree one are called leaves and we partition 
the set of vertices as V = L U N where L is the set of leaves and N is the 
set of internal vertices. We work with orientated trees, which are defined by 
directing each edge of T away from a distinguished vertex, n, known as the 
root of the tree. Consequently, a given edge lying between vertices u and v 
is specified as an ordered pair e = (u,v), where u lies on the (unique) path 
between v and 7T. The general Markov model of a phylogenetic tree is then 
made by assigning a set of random variables {X s , s G V} to the vertices of 
the tree; these random variables are assumed to be conditionally independent 
and individually satisfy the properties of a CTMC. Taking a distribution at 
the root of the tree, {P(A 7r = i) := tt^i G JC}, completes the specification 
of the phylogenetic tree. The interpretation of a phylogenetic tree is that the 
probability distribution at each leaf is associated with the observed sequence of 
a single taxon and the joint probability distribution across a number of leaves 
is associated with the aligned sequences of the same number of molecular 
sequences. 

For example in Figure 13.11 we present the tree consisting of four leaves which 
has probability distribution 



Setting P(A(t) = i) = Pi(t), we introduce the n- dimensional vector space V 
with preferred basis {ei, e 2 , . . . , e n } and associate the probabilities uniquely 
with the vector 




where 



PhhiziA '■— P(Ai — i\iX2 — %2i A3 — 23, A4 — 14), 
and we refer to these quantities as pattern probabilities. 



3.4 Tensor presentation 



p{t) = pi(t)e 1 + p 2 (t)e 2 + ...+ p n (t)e. 



3.4. TENSOR PRESENTATION 



40 



The time evolution of this vector is then governed by equation (13.4ft written 
in operator form as 

j t p{t) = Q(t)p(t). 

The solution of this equation is written as 

p(t) = M(t,s)p(s). 
The probabilities can be recovered by taking the inner product 

Pi(t) = (e h p(t)), 

and defining 

n 

= Y, ei > ( 3 - 9 ) 

i=l 

we have 



(e, P (t)) = i, vt 

In analogy we label the joint probabilities as 

Pili2...i m (t) := — Hi X2 = 12-, ■ ■ ■ , X m = i m ; t), 

and by introducing the tensor product space V® m we associate these proba- 
bilities with the unique tensor 

p (t) ^2Piii2-i m (t) e ii ® e » 2 ® • ■ • ® e w 

Again the probabilities are recovered from the inner product: 

Phi 2 ... lm (t) = (e n ®e i2 ®...®e im ,P(t)), 

and we define ft — ^2 e h ® e %i ® • • • ® e «m so that 

(f2,P(t)) = l, V*. 

We now introduce the branching events into this formalism. 
Consider a vertex on a phylogenetic tree where the stochastic evolution of 
a single random variable branches into that of two random variables. The 
corresponding mathematical operation is a mapping V — > V <S> V. In order to 
formalize this we introduce the branching operator 5 :V ^> V ®V . The most 
general action of a (linear) operator 5 upon the basis elements of V can be 
expressed as 



Sei = jT^ej <S> e k , 



(3.10) 
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where are an arbitrary set of coefficients set by the assumption of condi- 
tional independence across branches of the tree. 

To this end it is only necessary to consider initial probability distributions of 
the form 



7 = 1,2,..., 7i. 

Directly subsequent to the branching event the two leaf state is given by 

p(7) = ( J 7r (7) j 
i,j,k 

We implement the conditional independence upon the branches by setting 
¥{X 1 = i 1 ,X 2 = i 2 ,t = t'\Xy = X 2 = 7 , t = 0) 
= P(Xx = h, t = t'lXx = 7, t = 0)P(X 2 = i 2 , t = t'\X 2 = 7, t = 0). ' 
Using the tensor formalism the transition probabilities can be expressed as 

n 

'kit 

P(X 2 = i 2 ,t = t'\X 2 = 7 ,t = 0) = £>2L(f)*2 



¥(X 1 =i 1} t = t'lX, = 7) f = 0) = XXU'W 



fci 



sr7 



¥(X, = i u X 2 =i 2 ,t = i!\X x = X 2 = 7, t = 0) 

fel,fc2,fc3 

Implementing (13.111) leads to the requirement that 

pfclfc 2 _ r7 X7 
X 7 — %%' 

and the basis dependent definition of the branching operator 

bti = e,i ® ej. 

From this construction we can express the phylogenetic tree Figure 13.11 as 
p = (l l ® M 3 <g> M 4 )l ®1® 5(Mi ® M 2 ® M 5 )l (8) S ■ Sir, 

which can also be written in the more convenient form 

p = (Mi ®M 2 ®M 3 » M 4 )l ® 1 ® £(1 ® 1 ® M 5 )l ® 5 • 5tt. 

This form can be generalized so that any phylogenetic tree can be expressed 
in the form 

P := Mi ® M 2 ® . . . <8> M m P, (3.12) 

with M a e GL(n), 1 < a < m, and P is found by taking P and setting the 
Markov operators on the leaf edges, Mi, M 2 , . . . , M m , all equal to the identity 
operator. This representation will be of importance to us as we consider 
invariant theory in terms of phylogenetics. 
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3.5 Entanglement and phylogenetics 

In this final section we will study the properties of a phylogenetic tensor eval- 
uated on invariant functions of the general linear group. Recalling (13.71) . we 
see that in all reasonable cases the determinant of the transition matrices of 
a phylogenetic tree is non-zero. This implies that the transition matrices are 
elements of GL(n). Thus in the case of a phylogenetic tensor of the form 
(13.121) . an invariant will take the form 

m 

f(P) = l[det(M a ) k f(P). 

a=l 

Presently we study the case where \fC\ = 2 and the phylogenetic tensor occurs 
in the tensor product space relevant to two qubits and three qubits respectively. 

3.5.1 Two qubits 

For the case of two qubits the most general phylogenetic tensor is given by 

P = (Mi ® M 2 )5ir, (3.13) 
which corresponds to the tree of Figure 13.21 Following (13.121) we have 

P = 5n. 

As will be discussed in detail in Chapter HI the concurrence can be used to 
establish the magnitude of divergence between a pair of sequences derived from 
a single branching event. The concurrence of the phylogenetic state (13.131) is 
given by 

C(P) = det[Mx] det[M 2 ]C(57r). 

Explicitly we have 




1 2 



Figure 3.2: Phylogenetic tree with two leaves 



3.5. ENTANGLEMENT AND PHYLOGENETICS 43 



and find that 

C(P) = det[M x ] det[M 2 ]7ri7r 2 . 

Assuming that the determinants of the Markov operators are non-zero we see 
that the phylogenetic tensor is on the entangled orbit. 

In comparison, if there is no stochastic dependence between the random vari- 
ables the phylogenetic state can be expressed as 

P = Pl ®P2, 

which is a product state, such that the random variables X\ and X 2 are 
stochastically independent, and the concurrence vanishes. Thus the non- 
vanishing of the concurrence can be used as a test of stochastic dependence 
between any two molecular sequences. In Chapter H] we will show that the 
determinants of the Markov operators tend to zero as t tends to infinity and 
we conclude that the phylogenetic (I3.13P state tends to a product state af- 
ter an infinite amount of divergence. This is what one would expect as the 
case of infinite divergence should correspond exactly to the case of stochastic 
independence. 

3.5.2 Three qubits 

In this section we study the phylogenetic state 

P= (M 1 ®M 2 ®Mz)l®5(l®M A )5ix, (3.14) 

which corresponds to the tree Figure 13.31 Again following (13.121) we have 

P = 1 <g> 6(1 ® M 4 )57r. 

We now determine which orbit the phylogenetic state (13.141) lies in. By the 
general properties of the tangle we find that 

3 

T(P) = J](det M,) 2 T(P), 
i=i 

and by explicit computation 

T(P) = (detM 4 ) 2 (7r 1 7r 2 ) 2 , 

to conclude that 

T(P) = (det M l det M 2 det M 3 det M 4 ) 2 (7r 1 7r 2 ) 2 . 

From this we can conclude that the phylogenetic state (I3.14p lies on the GHZ 
orbit and the evaluation of the tangle upon three aligned sequence can be used 
as a test of triplet stochastic dependence. 
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3.5.3 Phylogenetic relation 

Referring to (13. 7|) . we see that for continuous time Markov chains the deter- 
minants of the transition matrices satisfy: 

< det M(s, t) < 1, VO < t < oo, 

lim det M(t) = 0. ^ 3,15 ' ) 

t— >00 

Above we have seen that for phylogenetic data of three aligned sequences 
derived from a tree the tangle polynomial is non-zero, and for two aligned 
sequences derived from a tree the concurrence is also non-zero. But taking 
(13.151) into account we see that, if any one of the branches of a phylogenetic tree 
is extended to infinite length this will induce the vanishing of these invariant 
functions which implies that the corresponding part of the phylogenetic tensor 
decouples from the overall state to form a partial product state. Thus the case 
of no stochastic dependence directly corresponds to entanglement of the tensor 
state and stochastic dependence can be tested for using invariant functions. 
Introducing independent time parameters for each external branch, we can 
express the phylogenetic tree (13.31) as 

P(ti,t 2 ,t 3 ) := [Afi(0,t x ) ®M 2 (0,t 2 ) ®M 3 (0,t 3 )]l® £[1® M A ]6ir. 
Now, as we have seen, the tangle polynomial will satisfy 

lim T(P(ti,t 2 ,t 3 )) = 0, 
Va = 1,2,3. 

For the concurrence we have 

limC 6 (P(ti,t2,f3)) = 0, (3.16) 

if and only if a = b. From these observations we can conclude that we have 
the limit: 

lim p il i 2 i 3 (t 1 ,t 2 ,t 3 ) =p { l ) pff ( y t 2 ,t 3 ), 

tl-+oo 

and similar for t 2 ,t 3 . The phylogenetic state decouples into a partial product 
state after an infinite amount of stochastic divergence. This is what one would 
expect, as the branch lengths of the tree become so large that it is impossible 
to observe the branching event which relates to leaves. 

From these observations we define a phylogenetic relation to exist whenever 
the relevant phylogenetic tensor cannot be written as a product state. 
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3.6 Closing remarks 

In this chapter we have established the mathematical connection between the 
notion of entanglement and that of phylogenetic relation. We showed that 
simple group invariant functions used to quantify entanglement can be utilized 
in the phylogenetic case. We focused on the invariant function known as the 
tangle, but considered only the case of two character states. In the next 
chapter we will study the properties of the tangle in the case of three and four 
character states. 



Chapter 4 
Using the tangle 



The distance based approach to phylogenetic reconstruction using the neigh- 
bor joining algorithm is a commonly used technique [23j [37J HHJ [52] . Under the 
assumptions of a Markov model of sequence evolution, the phylogenetic rela- 
tionship is uniquely reconstructible from (suitably defined) pairwise distances 
[5"4"] . The approach relies crucially upon the calculation of distance matrices 
from aligned sequence data which give a measure of the pairwise evolutionary 
distance between the extant taxa under consideration. As far as tree building 
algorithms are concerned it is required that the distances are strictly linearly 
related to the sum of the (theoretical) edge lengths of the phylogenetic tree, 
and that the parameters of the linear relation do not vary across the tree. It 
is essential to the analysis that the measure of distance chosen has both bio- 
logical and statistical as well as mathematical significance. If one assumes the 
standard Markov model, the edge lengths of a phylogenetic tree can be taken 
mathematically to be a quantity that we refer to as the stochastic distance. 
(For mathematical discussion of this quantity see Goodman [21] who refers 
to the stochastic distance as intrinsic time, and see also Barry and Hartigan 
(5] who gave a biological interpretation.) Under the assumptions of a gen- 
eral Markov model the logdet formula is commonly used to obtain pairwise 
distances. Further, if one may assume a stationary process then the logdet 
formula can be modified to give an estimate of the actual stochastic distance 
|4Uj . (That is, the constants of the linear relation are set by the stationarity 
assumption.) 

Distance based methods and, consequently, the log det formula are often used 
in favour of other methods (such as maximum likelihood) in cases where there 
has been significant compositional heterogeneity during the evolutionary his- 
tory. The theoretical basis which motivates this usage was presented by Steel 
[56] and is discussed in Lockhart, Steel, Hendy and Penny [ID] and Gu and Li 
[25] ■ More recently, Jermiin, Ho, Ababneh, Robinson and Larkum published a 
simulation study which confirms that the log det outperforms other techniques 
in this case [33] • Lockhart et al. showed that by using the assumption that the 
base composition remains close to constant, the logdet formula can be mod- 
ified to give an estimate of the actual stochastic distance. However, as will 
be shown, in both its original and modified form the log det formula includes 
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an approximation crucially dependent upon the compositional heterogeneity 
remaining minimal. The effectiveness of the log det formula to correctly recon- 
struct the phylogenetic history when there has been significant compositional 
heterogeneity is thus brought into question. Hence there is a contradictory 
state of affairs between the theoretical basis of the log det and the circum- 
stances under which it is implemented. In this chapter we will generalize the 
log det formula in such a way that this dependence upon base composition is 
truly absent. 

A disadvantage of the log det formula is that it uses only pairwise sequence 
data and is blind to the fact that extra information regarding pairwise dis- 
tances can be obtained from the sequence data of additional taxa. Felsenstein 
[21] mentions that it is surprising that distance techniques work at all given 
that they ignore the extra information in higher order alignments. This chap- 
ter details exactly how the log det formula can be improved upon by taking 
functions of aligned sequence data for three taxa at a time. It may seem 
counter-intuitive that consideration of a third taxon can impart information 
regarding the evolutionary distance between two taxa, but it is the case that 
by considering a third taxon the log det formula can be refined. This result 
depends crucially upon the fact that, as is somewhat trivially the case for two 
taxa, there is only one possible (unrooted) tree topology relating three taxa. 
(For discussion of what a tree topology is see jB], Chapter 5.) It is possible to 
refine the log det formula by considering the respective distance to an arbitrary 
third taxon. The reader should note that the use of triplet sequence data to 
the problem of reconstruction of the Markov model was also considered in [12] 
and [32] • The approach discussed in the present chapter is original in the sense 
that triplets of the aligned sequences are being used explicitly in a distance 
method, and follows on from the theoretical discussions of [5T?] . 
A complication arises regarding the total stochastic distance between leaves 
and the placement of the root of a phylogenetic tree. It turns out that if we 
define phylogenetic trees of identical topology to be equivalent if they give the 
identical probability distributions then we find that the total stochastic dis- 
tance between leaves is not, in general, left unchanged as we move the root of 
the tree. The so defined equivalence class provides a generalization of Felsen- 
stein's pulley principle [19] and was first presented in Steel, Szekely and Hendy 
[57j . The fact that the stochastic distance is not left unchanged is a surprising 
result and has important implications regarding the interpretation of the edge 
lengths of phylogenetic trees defined under the Markov model. In particular 
this result implies that the log det technique is an inconsistent estimator of 
pairwise distances on phylogenetic trees. It is the purpose of this chapter to 
present a new estimator that is consistent in the case of phylogenetic quartets. 
We are motivated to present this construction of quartet distance matrices 
by the interest in phylogenetic reconstruction of large trees from the correct 
determination of the set of (™) quartets [H EE] • 

This chapter will begin by formally defining the stochastic distance. We will 
then examine how the general linear group invariants, the det ( 12. 13f) and the 
tangle (12.171) . can used to estimate the stochastic distance between any two 
taxa on a phylogenetic tree. As a consequence of this discussion we will ex- 
amine a generalized pulley principle and finish by showing that by including 
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the tangle in the analysis we can arrive at a consistent estimator. 
Note: This chapter follows closely the text of |6Uj . 

4.0.1 Stochastic distance 

In this chapter we will be interested in the assignment of edge lengths to 
phylogenetic trees. To this end we consider the rate of change of base changes 



By considering (13. 5p and (13. 8p this quantity can be explicitly expressed using 
the rate parameters: 



From these considerations we define the stochastic distance to be given by the 
expression 



By considering the time-ordered product representation (13.61) and the Jacobi 
identity dete x = e trX , we find that the stochastic distance can be directly 
related to the transition probabilities of the Markov chain: 



Our assignment of edge lengths will take the Markov matrix associated with 
each edge and set the edge length equal to the stochastic distance. 
The relation (14.11) is known in various guises in both the mathematical and 
phylogenetic literature (5j El] and, as will be confirmed in the next section, is 
the basis of the log det formula. It should also be noted that (14.11) will remain 

positive and finite because u>(s, s) = 0, X(s) > and the integral J Q T X(t)dt is 
not expected to diverged 

1 It is standard to include a factor of n _1 in this definition. However, this factor clutters 
the consequent formulae and here we do not include it as it has no consequence to the 
forgoing discussion and can always be incorporated into the analysis later. 

2 There are two cases where the integral may diverge, but we can safely exclude these 
possibilities as follows, i. \(t) may be a badly behaved function. We can reject this 
possibility outright in phylogenetics as there is every reason to expect the rate parameters 
to change smoothly with time. ii. T — > oo. We can safely ignore this possibility as we will 
be assuming that the divergence times of the Markov chain are sufficiently small such that 
the phylogenetic historical signal is still obtainable. 



at time s: El 





trQ(s). 





(4.1) 
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4.0.2 Observability of the stochastic distance 

An interesting consideration (which at first sight is at odds with our aims) 
is that given a single random process modelled as a CTMC there is simply 
no way of inferring the value of the stochastic distance from an observed 
distribution without making restrictive assumptions about the process and 
the initial distribution. This is best illustrated by considering a stationary 
CTMC for which the rate-parameters are time- independent and given an initial 
distribution 71* (0) satisfy 



Now, although the consequent distribution is time- independent, 7Tj(£) = 71* (0), 
and hence carries zero informative value in comparison to the initial distribu- 
tion, the stochastic distance itself increases linearly with time 



From this observation it is clear that in the general case if all we have access to 
is the final distribution, there is no way we can estimate the stochastic distance 
unless we make some additional assumptions about the stochastic process. 
The remarkable fact is that in the case of phylogenetics it is possible to estimate 
the stochastic distance from the observed distribution. (As we will show in 
Section 14.11 this is true even for the case where the underlying chains are 
stationary!) 



In this section we will derive and discuss a standard approach to the construc- 
tion of distance matrices. (For an excellent perspective of the various measures 
of phylogenetic pairwise distance see [3].) A distance matrix, = [4> a b]{a,b)eL, 
is constructed from the aligned sequence data of multiple extant taxa such 
that each entry gives a suitable estimate of the distance between a given pair 
of taxa. The mathematical conditions on the (f) a b are the standard conditions 
of a distance function as well as the four point condition [SI] (which is required 
for the distance measure to be consistent with the tree structure): 



i 





4.1 Pairwise distance measures 



<fiab > 0, 

4> ab = iff a = b, 

4>ab = 4>bai 



(4.2) 



4>ab + (ficd < max{4> ac + (f) bd , (j) ad + 4> bc }; 



V a,b } c,d G L. 



There are no further conditions required upon for it to give a unique tree 
reconstruction [5lj. However it is of course desirable for the distance measure 
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7T 




Figure 4.1: Phylogenetic tree of two taxa 



to have a well defined biological interpretation. To this end, for a given edge 
e, we define the edge length, ui e , which we set to be the stochastic distance 
(14.1 1) taken from the Markov model: 

u e = — logdet M e . 

It is then apparent that any significant estimate of pairwise distance must 
statistically be expected to converge to a value which is linearly related to the 
sum of the stochastic distances lying on the (unique) path between the two 
taxa under consideration. It should be clear that such a measure will satisfy 
the relations (14. 2p . It is crucial to the performance of the distance measure 
under a tree building algorithm that the parameters of the linear relation are 
expected to be constant for all pairs of taxa. That is, given the unique path 
between leaf a and b, P(T; a, b), we are demanding that statistically we have 
the following convergence: 

4> ab — >■ au(a, b) + /3, 

where 

u(a,b) := ^ u ^ 

eeP(T;a,fe) 

and a and j3 are expected to be independent of a and b. As we will see, the 
log det formula does not satisfy this property for the most general models. 



4.1.1 The logdet formula 

In Figure 14.11 we consider the two taxa phylogenetic tree, with pattern proba- 
bilities given by 

E(l) (2) 
<>y*> (4.3) 

j 

By considering the matrices defined as 

p(i,2) . _ r_ i 



4.1. PAIRWISE DISTANCE MEASURES 



51 



it is easy to show that (14.31) is equivalent to 

p( 1 ' 2 ) = MxAril^. 
Taking the determinant of this expression and considering (14.11) yields 
det P (1 ' 2) = det M x det M 2 det 

= e -(«i+^)JJ 7r .. ( 4 - 4 ) 

i 

This expression can be generalized to the case of any two taxa from a given 
phylogenetic tree: 

detP^ = e -^ Y[4 a ' h) , (4.5) 

i 

where 7r| ct,b ^ is the distribution at the most recent ancestral vertex between taxa 
a and b determined by the meeting point of the two paths traced backwards 
along the phylogenetic tree from leaf a and b. 

Now uj(a, b) is theoretically equal to the total stochastic distance between 
each of a and b and their most recent ancestral vertex and hence it is clear 
that — log det p( a ' 6 ) will be linearly related to this quantity. In the original 
formulation of the log det, a distance measure between two taxa was defined 
as 

cU: = -logdetP (a ' 6) 

= a;(a,6)- Slog^], (46) 

i 

and shown to satisfy the conditions (14. 2p [56J . From this relation it seems that 

one can take a = 1 and (3 = — log[7r} a ' 6 ^] and evaluate (14.61) on the observed 
pattern frequencies for each pair of taxa to calculate a well defined distance 
matrix from a set of aligned sequence data (as was presented in jlQ]). This 

procedure depends crucially upon the shifting term j3 = Yli log[7rj a ' 6 ^] being 
independent of a and b. However, this is only true in special circumstances 
such as star phylogeny or if the base composition is constant (the stationary 
model). In the general case, one is led to a different shifting term depending 
on the topology of the tree (this was noted in Sumner and Jarvis [59] and we 
reproduce the result here). Consider the phylogenetic tree of three taxa given 
in Figure 14.21 with pattern probabilities given by 

E(l) (2) (3) (4) 

By calculating (14.61) for the three possible pairs of taxa we find that 

dw = (uji + UJa + u 2 ) - 2j lo S 

i 

dl3 = {C0l + UJ4, + UJ 3 ) - log TT U 

i 

d 23 = (wi + u 3 ) - log 



4.1. PAIRWISE DISTANCE MEASURES 



52 




from which it is explicitly clear that the shifting term is not constant across 
this phylogenetic tree. The shifting term is dependent on the base composition 
at the most recent ancestral node of the two taxa and from the above example 
it is clear that this depends on the topology of the tree and is not always simply 
the root of the tree. This means that (14.61) does not produce distance matrices 
whose entries are linearly related to the edge length of the tree because the 
entries of the matrix will depend essentially upon the topology of the tree. 
It is, however, possible to obtain an estimate of the total stochastic distance 
between any two taxa by modifying the logdet formula. The ancestral base 
composition is approximated by using the harmonic mean 

n^'MIRM^, ,4.7) 

where is the closest common ancestral base composition between taxa a 

and b and tt| : = P(X a (r a ) = i) (and similarly for b). One is then led to the 
formula 

d' ab := -logdetP^ + | J2 h )i2 (log4 a) +log4 b) ), V a,b G L. (4.8) 

where d' ab is then an estimator of the total stochastic distance between taxa a 
and b. (This form of the logdet formula was presented in [40] and |54j). 
In the case of a stationary base composition model the additional assumption 
is made that 

y~] mlf-Kj = TTf, V e G E. 

3 

In this case we have 

^ h) = 4 a) =4\ Va,6GL, 

and it is clear that the harmonic mean approximation becomes an exact re- 
lation and the logdet formula is expected to converge exactly to the total 
stochastic distance between the two taxa. 



4.1.2 The tangle 



In this section we will show how the logdet formula can be generalized to 
obtain, for the most general Markov models, an unbiased estimate of the 
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distance matrix. The basis of the technique is the existence of a measure 
analogous to (14.4p which is valid for triplets. 

Sumner and Jarvis |59j presented a polynomial function T which is known 
in quantum physics as the tangle and can be evaluated on phylogenetic data 
sets of three aligned sequences in the case of n = 2. Evaluated on the pattern 
probabilities of any phylogenetic tree of three taxa, {a,b,c}, the tangle takes 
on the theoretical value 



T(a,b,c)=e- 2 ^ a > b > c m\~ ! ) . (1.9) 

where 




u(a,b,c) := ^< 



eST 

7i is the common ancestral root of the three taxa and this relation holds inde- 
pendently of the particular tree topology which relates {a, b, c}. This indepen- 
dence upon the topology is a very nice property and is crucial to the practical 
use of the tangle as a distance measure. The similarity between (14.91) and (14. 5B 
should be noted. 

In this chapter we report generalized tangles, which are polynomials which 
satisfy (14. 9p for the cases of n = 3, 4 in addition to the n = 2 case which 
was presented in [59]. It is possible to infer the existence of the tangles and 
derive their polynomial form from group theoretical considerations. Here we 
give forms using the completely antisymmetric (Levi-Civita) tensor, e, which 
has components _i n and satisfies ei2... n = 1. For the cases of n = 2, 3, 4 the 
tangles are given byj 

T — — X^ 2 

1~3 = y^^lPhi2hPjli2hPk 1 k2k3,Pl 1 l2hPm 1 m2rn^Pn 1 n2n i 

'^I\i\k\^i2k2l2^kzlirnz^l\m\n\^m2n2i2^riiizii i 

— 41 2-/1 PhjlkiPi2j2k2Phj3k i Pi i j i k i Pi$jsk 5 Pi 6 j 6 k 6 Pi 7 j 7 k 7 Pi & j 6 k 6 

' e iii2i3i4 e i5i6i7i8 e jljsj4jg e j2j6j337 e kik 5 k2k 6 £k 3 k 7 k4k8'-, 

respectively, (where the summation is over every index). The expression (14. 9 j) 
can be proved by studying the group theoretical properties of the tangle (see 
|59j ) and by explicitly expanding the above forms. For the tangle on two 
characters we find 

% = - P%2,A\ + 2pi2lPl22P21lP212 " P\x\V\\2 + 2^112^122^211^221 + 

1P\YlPYl\P2YlP22\ ~ ^P\\\P\TlP2YlP22\ ~ p\viP\%\ ~ ^P\YlP\2\P2\\P222 + 
1<P\\\P\22P2\\P222 + 2p m pi2lP212P222 + ^V\\\V\\2P21\Vl22 - Pm£>222- 

Substantial computer power is required to explicitly compute T 3 and T^. These 
polynomials have 1152 and 431424 terms, respectively. 



3 This expression for 7~2 corrects for the erroneous expression presented in 
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4.1.3 Star topology 

Consider the phylogenetic tree relating three taxa with a star topology: 



1 




with pattern probabilities given by the formula 

EM (2) (3) 

3 

Here we will use the fact that the root of this tree is also the common ancestral 
root of any pair of the three taxa. (This is not the case in general if we allow for 
a general rooting of the tree and/or more than three taxa. The complications 
arising in these cases will be dealt with in the next section.) 
Considering the formulae (14. 9ft and ( 14.4ft we are led to introduce the novel 
distance matrix, A, with the pairwise distance between {a, b} given by 

:= -log T{a, b, c) + log detP (a ' c) + log detP (6 ' c) , a, b, c G L. (4.10) 

From g3D and 0333) [t follows that 

Ag =07(0,6), 

such that our new formula will directly give the stochastic distance between 
the two taxa. There is no need to make the harmonic mean approximation and 
this distance measure is mathematically and biologically meaningful. This is 
the main result of this chapter: given a set of aligned sequence data, the tangle 
formula (14.101) can be used to compute the exact pairwise edge lengths for any 
triplet. As mentioned above, the explicit polynomial form of the tangle has 
been computed for the cases of two, three and four bases and it is our intent 
that ( 14. 10p will provide a significant improvement over the logdet formula in 
the calculation of pairwise distance matrices for these cases. 

4.1.4 Summary 

Considering the stochastic distance to be the correct way to assign edge lengths 
to branches of a phylogenetic tree, we have reviewed three different ways of 
obtaining a distance measure between any two taxa a and b: 
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1. d, 



ab 



logdetP^ 



2. d' 



'ab 



logdetP^ + | 



E lll42 (log< J +log< j ) 



3. A 



ab 



log T(a, 6, c) + log det P (a ' c) + log det P (b ' c) 



where one substitutes the observed pattern frequencies into these expressions. 
From the previous considerations we found that these three distance measures 
have the following properties: 

1. When dab is evaluated on a set of observed pattern frequencies, this es- 
timator satisfies the requirements of a distance function (14. 2p . but is 
inconsistent with the general Markov model as the estimate is not ex- 
pected to converge to a value that is linearly related to u(a, b). 

2. When d' ab is evaluated on a set of observed pattern frequencies, this es- 
timator satisfies the requirements (14.21) and is expected to converge to 
a value that is linearly related to u(a, b) whenever the compositional 
heterogeneity is absent. In the heterogeneous case this quantity approx- 
imates uj(a, b) by using (14. 7p . 

(c) 

3. When A^ fe is evaluated on a set of observed pattern frequencies, this 
estimator satisfies the requirements of ( 14. 2ft and is expected to converge 
exactly to u(a, b) in all cases. 

Thus we see that the tangle formula (14. lOf) should be a significant improvement 
as an empirical estimator of u(a,b) upon both forms of the log det formula. 
However, the formula (14.101) depends on taking an arbitrary third taxon, c. 
The question remains as to what to do in the case of constructing pairwise 
distances for sets of greater than three taxa. The surprising answer to this 
question will be addressed in the next section where we will bring into question 
the uniqueness of the theoretical quantity u(a,b). The discussion has conse- 
quences for the interpretation of each of the estimators of pairwise distances 
that we have discussed. 



4.2 Generalized pulley principle 

In this section we generalize the Felsenstein's pulley principle [W\. In its origi- 
nal formulation the pulley principle describes the unrootedness of phylogenetic 
trees where the underlying Markov model is assumed to be reversible and sta- 
tionary. Here we show how the pulley principle may be generalized to remain 
valid under the most general Markov models. Our immediate motivation is to 
show that (14.101) remains a valid distance measure under the circumstance of 
a general phylogenetic tree of multiple taxa. Unfortunately this generalization 
introduces surprising mathematical complications which have consequences 
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not only for our formula ( 14.101) . but also for the logdet technique and any 
other estimate of the stochastic distance upon a phylogenetic tree. The dis- 
cussion will lead to the consequence that, for a given tree topology, there are 
multiple - actually, infinitely many - phylogenetic trees with identical prob- 
ability distributions. (These phylogenetic trees differ by arbitrary rerootings 
and consequential redirection of edges.) We will see that the generalized pulley 
principle shows that as far as inference from the observed pattern frequencies 
is concerned, there is no theoretical justification behind specifying the root 
of a phylogenetic tree if the most general Markov model is allowed. Also, we 
will see that the theoretical value of the stochastic distance is not constant for 
arbitrary rerootings of a phylogenetic tree. Clearly, if the stochastic distance 
is not uniquely defined theoretically, then one must be careful in interpreting 
any formula that gives an estimate thereof from the observed data. 
Considering a phylogenetic tree as a directed graph shows that a rerooting 
involves redirecting an edge (or part thereof). The property required is that 
the Markov chain on the involved edge is taken to progress as if time has been 
reversed, and we refer to the new chain as the time-reversed chain. This should 
be compared to the requirement of reversibility as defined in the mathematical 
literature, (for example see [29]). In the case of a stationary and reversible 
Markov chain the time-reversed chain (as we will define) is identical to the 
original chain. 

By way of example, we take the rooted tree of three taxa (14. 7p and redirect 
the relevant internal edge to give the following rerooting: 





(4.11) 



rooted at 7r 



1 2 

rooted at p 



Our immediate task is to infer the existence of an appropriate time-reversed 
Markov chain, N, such that these two phylogenetic trees give identical prob- 
ability distributions. If we equate the pattern probabilities of (14.111) and con- 
tract all edges except the one we are reversing, we are led to the simple alge- 
braic solution 



71; 



rrijiiTi 



Pi 



(4.12) 



(This solution was presented in [57].) Presently we use this result to give an 

explicit form in the general case. 

Given a CTMC X(t) with transition probabilities 



rriij(t, 



F{X{t) = i\X{s)=j) 
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we wish to find a second CTMC, Y(t), such that, given any T > 0, we have 

F(Y(t) =i) = TTi(T-t), V 0<t<T. 

That is, if the direction of time is reversed, the second CTMC Y(t) has identical 
distribution to X(t). The uniqueness of Y(t) is a technical matter which we 
do not consider, because in the phylogenetic case there are extra restrictions 
which led to the unique solution (14.121) . 
Considering again the general case, we write 

F{Y{t)=i\Y(s)=j) :=n l3 (t,s) 

and use (I4.12p to infer the general solution 

mu(T — s,T — t)iri(T — t) 
n«(t, s) = ' > tK (4.13) 

TTjiT-S) 

It is trivial to show that these transition probabilities satisfy the requirements 
of a CTMC: 

^n ii (t,s) = l, Vi, 
j 

N(t,s)N(s,u) = N(t,u), 
where N(t,s) = [njjjt, s )h,jetc)- 

Furthermore, by using (13.81) we find that the rate parameters of the time- 
reversed chain can be expressed as 



f,As):= dn " (t ' 3 



t=S 



dt 

_ qjjjT - s)-Kj{T - s) \^ SjjqikjT - s)% k (T - s) 
TTjiT-s) ^ *j{T-s) 

From which it follows that 

fij(s)>0, Vi^j; fu(s) = -Y^Ms) 

which confirms that the fij(s) are a valid set of rate parameters for a CTMC 
(as expected). It should be noted that even in the case where X(t) is a 
homogeneous chain it is certainly not the case in general that Y(t) is also 
homogeneous. Consider, however, the stationary and reversible case, with the 
respective conditions: 

3 

qijltjiO) = qjiiii(0), 
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where the stationarity condition ensures that 



7T<(0) 



vt. 



In this circumstance it follows that 



ij Qij ) 



such that Y(t) = X(t) and is hence also stationary and reversible. This was 
the basis of Felsenstein's initial formulation of the pulley principle - if one 
considers only stationary and reversible Markov chains on a phylogenetic tree, 
any time-reversed chain is identical to the original Markov chain and hence 
a phylogenetic tree can be arbitrarily rerooted. We have given a continuous 
time generalization of Felsenstein's result which removes the stationary and 
reversible restriction. 

Equipped with the solution (14.131) it is possible to take any phylogenetic tree 
and find an alternative tree of identical topology, but rooted in a different 
place, such that the alternative tree generates an identical probability dis- 
tribution to that of the original. This is the basis of our generalized pulley 
principle. 

The reader should note that we have proven, under the assumptions of the most 
general Markov model, that it is not possible to determine the orientation of a 
phylogenetic tree by only considering the joint probability distribution it gen- 
erates at the leaves. Thus, any procedure that attempts to determine the root 
from the observed pattern frequencies must be justified by making additional 
assumptions about the underlying stochastic process. Chang [12J showed that 
the tree topology and (up to permutations of rows) the set of transition matri- 
ces, are reconstructible from the set of triples of the joint distribution at the 
leaves. This is consistent with our result as Chang explicitly prohibited in- 
ternal nodes with two incident edges and worked with unrooted/unorientated 
trees. Baake [2] showed that (up to similarity transformation) the return-trip 
matrices (in our notation M(s,t)N(t,s)) are identifiable from the set of pair- 
wise joint distributions at the leaves. Again this is consistent with our result. 
The curious aspect of the generalized pulley principle is that the stochastic 
distance is not conserved along the edge of the tree where the directedness 
was reversed. This is easy to show by considering the determinant of (14. 13j) 



Thus the stochastic distance in the reversed time chain is equal to that of the 
original chain if and only if 



det N(t, s) = det M{T 




<Ki{T-t) 



(4.14) 





- 1) 



1. 



(4.15) 




This property of CTMCs and their time-reversed counterparts was observed 
by Barry and Hartigan [5] . It can be seen that in the stationary case (14.151) will 
certainly be true. There are other cases where (14.151) may hold but there does 
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not seem to any biologically sound way to interpret the required condition. In 
the proceeding discussion we will consider the consequences of the generalized 
pulley principle upon the interpretation of distance matrices. We see that for a 
given observed distribution we can use the generalized pulley principle to show 
that there are multiple edge length assignments using the stochastic distance 
which are consistent with the Markov model on a phylogenetic tree. These 
edge length assignments differ from one another as a consequence of (14. 14ft . 

4.2.1 Interpretation 

For illustrative purposes we consider the consequence to the stochastic dis- 
tance of the rerooting of a phylogenetic tree of two taxa. We consider the 
phylogenetic trees illustrated in Figure 14.31 and by using the generalized pul- 
ley principle define their respective transition matrices so that their probability 
distributions are identical: 

rrijiTTi 
Pj 

j 

We find in the first case that we have 

u v (l, 2) = - log det M x - log det M - log det M 2 , 
and in the second case 

u p (l, 2) = - log det Mi - log det N - log det M 2 . 

Now in general det M ^ det N and we see that the two possible pairwise 
distances are not expected to be equal. However, from an empirical perspective 
it is impossible to distinguish these two possible theoretical scenarios because 
the probability distributions are identical. Now, because any estimator of the 
pairwise distance must be inferred from the observed distribution, we conclude 
that one must be careful to consider exactly what theoretical quantity one is 
obtaining an estimate of. For the case of the log det formula we find that the 
quantity it is estimating depends essentially upon the base composition of the 
observed sequences as follows: 

Considering the pairwise distance d' ab given by (14. 8p . from the generalized 
pulley principle we see that this formula will give an estimate of the stochastic 
distance between a and b, where the common ancestral node is placed such 
that the quantity 

X (a,b) :=U^ b) - 

i 

is minimized. Thus the log det method will be inconsistent in the sense that, if 
there has been compositional heterogeneity, the pairwise distance it produces 



riij — 
Pi = 



n 



.11 ,*2 



7r (a) 7r (fe) 

' »1 *2 
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rooted at tx rooted at p 

Figure 4.3: Using the generalized pulley principle 



will be an estimate for the edge length assignment where x{ a i b) is minimized. 
This may have nothing to do with true placement of the common ancestral 
vertex and it may even be the case that x( a > b) has multiple minimum points. 
The situation amounts to the fact that, for a given phylogenetic tree, one is 
(potentially) using the logdet to estimate pairwise distances with a different 
edge length assignment for each and every pair of taxa. Clearly for the analysis 
of multiple taxa this could be become a significant problem and any alternative 
approach which removes this inconsistency would be beneficial to the analysis. 
We see that the consequences of the generalized pulley principle and (14.1 4p 
to the interpretation of the Markov model of phylogenetics are quite subtle. 
The generalized pulley principle is telling us that there is no direct way to 
distinguish the rootedness (and equivalently the directedness of internal edges) 
of phylogenetic trees. This is due to the fact that there are (infinitely) many 
phylogenetic trees of identical topology which generate identical probability 
distributions, differing only by the assignment of stochastic distance and the 
associated redirection of internal edges. 



In this section we will show that in the case of a phylogenetic tree of four taxa, 
the tangle can be used to construct consistent quartet distance matrices. These 
distance matrices will be consistent in the sense that theoretically they are 
constructed from one topology with one edge length assignment. This should 
be compared to the log det formula which in the general case can be estimating 
a different edge length assignment for each and every pairwise distance. 
For analytic purposes we use the generalized pulley principle to root the four 
taxon tree in two ways, as illustrated in Figure H~4l The difference between the 
two cases is simply in the directedness of the internal edge and the generalized 
pulley principle allows us to calculate the required transition probabilities so 
that the two trees generate identical probability distributions. The pattern 
probabilities for the two cases are given by 



4.3 The quartet case 




(4.16) 
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1. ,3 




2 4 



Figure 4.4: Four taxa tree with alternative roots 



where to ensure the equality of the two expressions we have 



(5) 

n (5) = m 3 i ^ 
13 Pj 



and pi = J2j ; '> 

From these expressions we wish to calculate the theoretical values of the for- 
mula (14.101) for each possible group of three taxa. To obtain these values one 
simply chooses the form of the tree such that after the deletion of a fourth 
taxon one is left with a three taxon tree of star topology. By sequentially 
deleting one taxon at a time we are led to the four star topology subtrees 
illustrated in Figure 14.51 and the corresponding pattern probabilities are given 
by the expressions 

(123) (1) (2) (3) (5) 

(124) (1) (2) (4) (5) 

h,h 

(134) (1) (5) (2) (4) 

= )^ m iiyi2h m )h m khphi 



(234) 
Pijk 



E(2) (5) (3) (4) 
m iiSJi m jh m khPh- 



From this it is easy to calculate the values simply by considering the results 
of the previous section: 



a(3) 
^12 


= w(l,2), 


A (4) 
^12 


= w(l,2), 


A(2) 
^13 


= 3), 


A (4) 
^13 


= w p (l,3), 


A (2) 
^14 


= w ff (l,4), 


A (3) 
^14 


= w p (l,4), 


A (l) 
^23 


= w ff (2,3), 


A (4) 
^23 


= w P (2,3), 


A (l) 
^24 


= ^(2,4), 


A (3) 
^24 


= w p (2,4), 


A (l) 
^34 


= w(3,4), 


A (2) 
^34 


= w(3,4), 



(4.17) 
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M 3 \ 3 




M 4 \4 





M 2 \ 2 



Figure 4.5: Three taxon subtrees 



where 



u(a, b) = u a + uj b , 
u n (a, b) = uj a + Lu m + uj b , 
uj p (a, b) = u a + u n + u b , 
u) m = — log det M, 
u n = — log det iV; 



and we have made use of (14.141) in the form 

^(logvri - log Pi 



0J n — <^r. 



We see that for any two taxa we have two options for assigning a pairwise 
distance. In the cases of the pairs (12) and (34) we see that either choice is 
consistent with the other, whereas in the case of the pair (13), (14), (24) and 
(34) the two choices lead to an inconsistent assignment of the internal edge 
length upon the tree. Effectively what is happening here is that for a four 
taxa tree there are two possible edge length assignments for the internal edge 
and for a given pair of taxa (ab) and third taxa c, the tangle formula (14.101) is 
estimating the distance between a and b by assigning one of the two possible 
edge lengths to the internal edge depending on the topology of the tree. 
It is possible to eliminate this inconsistency by using either a max or min 
criterion in the construction of the distance matrix: 



i max 
9 ab 



max{ A%} 
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or 




:= rain 



r a(c) a(c')\ 



By making one of these choices to construct a distance matrix we choose the 
direct edness of the internal edge of the phylogenetic tree (14.31) consistently. 
This procedure leads to an improvement of consistency upon the log det tech- 
nique for the construction of quartet phylogenetic distance matrices. It is 
hoped that this technique can be used fruitfully to improve the reconstruction 
of phylogenetic quartets, which can be used as a first step in the reconstruction 
of large phylogenetic trees P, [58] . 



In this chapter we have given a review of the standard assignment of branch 
weights to phylogenetic trees, reviewed the use of the log det formula as an 
estimator of pairwise distances and shown how a previously unknown polyno- 
mial, the tangle, can be used to construct an improved estimator. We have 
generalized Felsenstein's pulley principle and used this result to show exactly 
how the distance matrix estimates become inconsistent when applied to the 
reconstruction problem of multiple taxa. We have shown that the tangle for- 
mula along with a max/min criterion can be used to remove this inconsistency 
and construct consistent quartet distance matrices. 



4.4 Closing remarks 



Chapter 5 



Markov invariants 



In this chapter we will refine the use of invariant theory on phylogenetic trees 
by defining Markov invariants to be invariant functions specific to the gen- 
eral Markov model of sequence evolution. To achieve this we return to the 
representation theory introduced in Chapter [2] and show how the Schur func- 
tions can be used to give a count of the existence of the Markov invariants. 
A procedure which constructs the explicit polynomial form of these invariants 
will be developed and we examine, as prompted from Chapter El the structure 
of these invariants once placed on a phylogenetic tree. For the triplet and 
quartet case we show that there exist Markov invariants which have the addi- 
tional property of being phylogenetic invariants [H [TSJ [53]. These previously 
unobserved invariants can be used to achieve quartet reconstruction under the 
assumptions of the general Markov model. 



In Chapter [H] we considered the transition matrices of a continuous time 
Markov chain as a subset of the general linear group, and used this property 
to study the structure of invariant polynomials (used as measures of entan- 
glement in quantum physics) when evaluated on a phylogenetic tree. In this 
section we will close the gap between the general linear group and the sub- 
set consisting of the transition matrices of a CTMC by formally defining the 
Markov semigroup. (For a detailed discussion of the Lie group properties of 
the Markov semigroup and its relation to the Affine group see [31].) 
Recalling the vector 9 = ^2 e « A3.QI) . the Markov semigroup on n elements, 
Ji4(n), with parameters s < t < oo is defined relative to 9 as the subset of 
GL(n) which satisfies: 



5.1 The Markov semigroup 



3 



2 



1 



M(s,s) = l, 

M(t',t)M(t,s) = M(t',s) Vs 
(9,M(t,s)v) = (9,v) V v G V. 
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In general this set does not form a group. Consider the time evolution of a 
probability vector p(t), defined by 

p{t) = M(t,s)p(s), s<t. 

This time evolution will conserve the total probability 

J2pM = (M*)) = (e,M(t, s ) P (s)) = (e, P {*)) = 5>( s ) = L 

Defining 

dM(t,s). 



Q(s) :- 



|t=S; 



Of 

it follows that in the {e^} basis, the matrix elements of Q(t) = [qij{t)] satisfy 
Qij(t) > 0, Vi ^ j; fe(t) = 9ji(t), 

and hence each M(s, t) is a valid transition matrix for a CTMC 
In Chapter [3] we saw that the Markov model of phylogenetics can be considered 
in terms of the action x m GL(n) on V® m ( 13. 121) . We refine this to the action 
of x m Ai(n) on V® m so that any phylogenetic tensor can be written as 

P = Mi ® M 2 ® . . . <g> M m P, 

with M a G A^(n), 1 < a < m. Our present task will be to define and derive 
invariant functions, w : V® m — > C, which satisfy 



(P) = l[det(M a ) k w(P), 



w h 

a=l 



for all M a G A4(n), 1 < a < m, and analyse their relevance to the problem of 
phylogenetic tree reconstruction. (It should be noted that an invariant of the 
general linear group is certainly an invariant of the Markov semigroup, but 
the converse is not necessarily true.) 



5.1.1 Invariant functions of the Markov semigroup 

Before considering the more general case of the action x m Ai(n) on V® m given 

by 

tp -> Mi <g> M 2 ® . . . <g> M m ij, 
we will first define invariant functions of the action Ai(n) on V® m given by 
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Given that A4(n) does not form a group we have to be careful in our definitions 
of representations and invariant functions. To this end we define the set of 
functions C{V® m }^ {n) as the subset w £ £[V® m ] d which satisfy 

w o <g> m M = det(M) k w, VMeM(n), 
md = kn, 

(where we have carefully not invoked the inverse element M -1 ). Presently we 
will derive a sufficient condition for the existence of such invariant functions. 
Consider w £ C[V]d satisfying (15. lft . Under the canonical isomorphism u : 
C[V® m ] d -> (U 0m ) {d} (EH]) we have 

w = w(x), 

for some x £ 

(y®m-j{d}_ Carefully taking note of the relations (J23) and (12~TD 

it follows that 

u( X )°M® m = u(® md M t x). 
Hence w := u)(x) will satisfy ( 15. ip if and only if 

® md M l X = det{M) k X , VM £ .M(n). (5.2) 
Consider the tensor £ <g> expressed as 

with 77 £ \/{ fcn }. Recalling ( I2.2.6P and the definition of the Markov semigroup 
it follows that satisfies ( 15.21) : 

® kn+s M l (t) = det(M)>, VM £ M{n). 

Consider the decomposition of V^- k "^ ® U® s into irreducible representation 
spaces of GL(n): 

\X\=kn+s 

for some unknown multiplicities h\. Our present task is to identify the ir- 
reducible representation space in which the tensor <fi is contained. Assume 
£ V 1 with |/i| = kn + s and recall that 

yn _ Y® kn+S 
where is the projection operator satisfying 

Y 2 = Y 

5^^' = 0, l/i'l = 



5.2. ALTERNATIVE COMPUTATION OF INVARIANTS OF THE 

GENERAL LINEAR GROUP 67 



so that Y/j, is the unique Young operator satisfying 

Y,<j> = <\>. 

Considering the inherent permutation symmetry of 0, it is clear that 

(jl = {k + s,k n - 1 }. 

From this we conclude that G V^ k+S,hn }, and there exists x {V® m )^ 
satisfying (15.21) whenever 

as an irreducible subspace under GL{n). 

Proposition 5.1.1. A sufficient condition for the existence of a Markov in- 
variant w e C[U® m ]^ (n) is that (x m {l})®W 3 {k + s, k^ 1 } for some md = 
nk + s. 

In direct analogy to the development of Theorem (I2.3.3P we generalize this to 
the action of x m M(n) on V® m : 

Proposition 5.1.2. A sufficient condition for the existence of a Markov in- 
variant w G CIV ™]^ M ^ is that * m {k + s, fc n_1 } 3 {d} for some d = nk + s. 

Using the representation theoretical tools we have developed it does not seem 
trivial to show that these conditions are also necessary. However we now have 
at our disposal a tool for inferring the existence of Markov invariants in various 
cases. 

In the next section we will return to the construction of invariants for the 
general linear group in order to derive a technique allowing us to compute 
these Markov invariants. 

5.2 Alternative computation of invariants of 
the general linear group 

The construction of invariants of the general linear group was presented in 
Chapter [2] using the properties of the Levi-Civita tensor. Unfortunately this 
construction does not generalize to the case of the Markov semigroup. In this 
section we show how Young tableaux can be used to construct the invariant 
functions of GL(n) directly. In the next section we show how this technique 
can be generalized to allow for the construction of the Markov invariants. 



5.2. ALTERNATIVE COMPUTATION OF INVARIANTS OF THE 

GENERAL LINEAR GROUP 68 



5.2.1 Action of GL(n) on V® m 

Recall that the number of invariants of weig ht k in C[U® m ]J i(n) is equal to 
the number of occurrences of the partition {k n } in (x m {l})®M with kn = md. 
This gives us a technique for the proof of existence of invariant polynomials, 
but leaves us with the problem of their explicit construction. Recall Theorem 
I2.3.2l and we see that our task is to identify the one-dimensional representations 
of the general linear group in the decomposition of (y® 171 )^ . 
Suppose we consider U = V® m as a (nm-dimensional) vector space with basis 
Mi, u 2 , ■ ■ ■ , u nm . As we saw in Chapter (2J if U has a basis u\, u 2 , ■ ■ ■ , u nm then 
any X e can be constructed from an arbitrary x £ U® d by taking 

V? = Y{d}X, 

where the Young operator acts on the {u ai ® u a2 (g) . . . <S> u ad } basis of U® d , 
1 < «i, . . . , aid < nm. Now we define 

where the Young operator now acts on the {e^ ® e i2 <E> . . . ® e idm } basis of 
(V® m ) {d} = U {d \ l<h,...,i dm <n. The final step is to construct the single 
independent component of <fi using the semi-standard tableau: 



1 


1 




1 


2 


2 




2 










n 


n 




n 



and then map over the invariant ring using uj : (U®™)^ — > C[V® m ]d- The 
invariant is then 

/ : = uj(4>) = u(Y {kn} ip) = uj(Y {k n } Y {d} x), 
which will satisfy 

fog = det(g) k f, 

for all g e GL(n). 

There is no problem with choosing the operator Y^} as there is only one pos- 
sible standard tableau: 



1 


2 




d 



However there does not seem to be any a priori way of deciding which stan- 
dard tableau to use for the symmetrization Y^ny. In general there are more 
standard tableaux than one-dimensional representations. This is not a serious 
issue since the Young symmetrization procedure needs to be implemented in 
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an algebraic computation computer package. Our procedure was to make ju- 
dicious choices of standard tableaux and check for algebraic independence of 
the resulting invariants until the correct count was achieved. In what follows 
we will present the results of these computations. 

The above outlines the formal procedure. In practice we implement the algo- 
rithm as follows. The above is equivalent to computing 

^h...i md '■= Y{k"}1pii...i m 4>i m +l-i2 m ■ ■ ■ ^...W! (5-3) 

where 

{ab)ij) h „.i m . ■ ■ ip... ia ... ■ ■ ■ i>...i b ... ■ ■ ■ ip...i md ■= fpn-.A-m ■ ■ ■ ^...i b ... ■ ■ ■ i>...i a ... ■ ■ ■ il>...i md , 

for any 1 < a, b < md, defines the meaning of (15.31) and there is no need to 
symmetrize with Y^d}- (In practice the symmetries inherent in this procedure 
give us some clue as to how to choose the appropriate standard tableaux 
for {k n }.) We then set the indices of ^n...^ using the single semi-standard 
tableaux to get 

w(lp) = *12...nl2...n...l2...n- (5.4) 

Now this expression only depends on the choice of standard tableau for {k n }. 
In practice we compute (I5.4I) for different standard tableaux until we have the 
correct number of independent invariants. 

5.2.2 Examples 

We consider the case m = 2. We have ({1} x = {2} + {l 2 } 9 {l 2 }, 

and hence there is one invariant of degree d= 1. Of course this invariant can 
simply be found by symmetrizing U® 2 with the only standard tableau of shape 
{I 2 }: 



with corresponding Young operator 

V {l2} = (e-(12)). 

The symmetrized tensor is 

^hi 2 = Y { i2 } i) ili2 = ip ili2 - ip i2il . 

The invariant is found by inserting index labels from the relevant 
semi-standard tableau, so that 

W{lp) = ^12 = 1pl2 ~ 1p21- 

For d= 2 the output of Schur shows that ({1} x {1})^ 2 > 3 2{2 2 }. 
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There are two Young operators with shape {2 2 }: 



1 


2 


3 


4 



The invariants are then given by 
For the first tableau we have 



Y 



(e - (13) - (24) + (13)(24))(e + (12) + (34) + (12)(34)), 



and find explicitly for the semi-standard tableau corresponding to component 
^1212: 

h\{lj}) = 4>i 2 + 2lp 12 1p 2 l + ^21 _ 4^11^22, 

and for the second tableau 

h 2 {lp) = i>\ 2 - ^12^21 + ^21 - ^11^22- 

It is a simple exercise to show that these invariants are linear combinations of 
the two invariants produced in Chapter [5] (12.101) : 

h = fi - 4/ 2 , 

h 2 = fx - h- 

For the case of GL(3) on U® 2 Schur shows that ({1} x {1})®^ 3 > 3 2{2 3 }. The 
invariants are constructed from arbitrary ip G (U® 2 )® 3 as 



/ = u(Y {2 3 } oY {3} ip), 



with the standard tableaux 



1 


2 


3 


4 


5 


6 



1 


3 


2 


4 


5 


6 



generating two independent elements: 

hl{lp) = -^3^22 + ^12^13^23 + ^13^21^23 



^11^23 _ 2^13^22^. 



31 



+ ^12^23^31 + ^21^23^31 ~ ^22^31 + ^12^13^32 

+ ^13^21^32 - 2^11^23^32 + ^12^31^32 + ^21^31^32 ~ ^11^32 



^ 2 2 V>33 - 2^12^21^: 



33 



^21^33 + 4^11^22^33, 



and 



h 2 (lp) = ^13^22 - ^12^13^23 ~ ^13^21^23 + ^11^23 + ^12^23^31 

- ^21^23^31 + ^22^31 _ ^12^13^32 + ^13^21^32 ~ ^12^31^32 

- ^21^31^32 + ^11^2 + ^12^33 + ^21^33 ~ 2^11^22^33- 

Again it is possible to show that these invariants are linear combinations of 
the corresponding invariants produced in Chapter [2] (12.111) . 
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5.2.3 Action of x m GL(n) on V® m 

Recalling Theorem 12.3.31 we note that the number of weight k invariants in 
C[y® m ]* GL ( n ) j g e q Ua i t the number of occurrences of {d} in the decompo- 
sition of * m {k n }. For even m we have the identity 

* m {l n } = { n }, 

and for odd m 

* m {l n } = {l n }. 

Thus we see that for even m there is a single invariant function of degree d = n 
and for odd m there are none. For even m the invariant is generated from 

\jj . = y {l) Y {2) Y {m \b- ■ ib- ■ ib- 

* il—tnm {l n } {l n } ' ' ' {1"} Yll—lm V y « m +l...«2m ■ ■ ■ Hfn-')™"'"'" 

where each standard tableau Y^ a \ 1 < a < m, is 



a 

m + a 



(n — l)m + a 

We then set the indices of ^^...inm using the single semi-standard tableau for 
each Young operator to obtain 

w(lp) = *H...i22...2...nn...n- 

It should be clear that this procedure is completely equivalent to the invariants 
obtained using the Levi-Civita tensor Chapter [2] (12.141) . In the case n = 2, 
this procedure generates the determinant invariants (I2.13P and for n = 4 the 
quangles f)2.14p . 

However, as we will now see, we need to use the tableaux technique in order 
to do the same job for the Markov semigroup. 

5.3 Computation of the Markov invariants 

Here we will generalize the above technique for computing invariants of the 
general linear group to the case of the Markov semigroup. It should be noted 
that in the case of the general linear group, the basis in which the calculations 
are performed is of no consequence as the invariants take on the identical form 
(up to scaling) in any basis. (This is by definition!) However, in the case of the 
Markov invariants all calculations with Young operators must be performed 
in the basis {z ,z a }, see Chapter |2] 02. ip . This is due to the very definition 
of the Markov semigroup which depends on a particular choice of the vector 
9 = y/nz . Thus, in the subsequent discussion, it should be remembered that 
all Markov invariants are presented in the form they take in the {zq, z a } basis. 
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5.3.1 Markov invariants of Ai(n) on V® m 

In this section we consider the action of M. (n) on y® m given by 

if) -> ® m ip. 

Recalling Conjecture I5.1.1[ it follows that if 

for some md= nk + s there exists a Markov invariant w G C[V n8m ]^ . (In 
all that follows it should be noted that the case s = reproduces an invariant 
of the general linear group.) Computing 

where the standard tableau of shape {k + s, fc n_1 } used to define Y^j, +s ^n-ij 
is not fixed, but is chosen judiciously. The final step is to compute w(ip) by 
inserting indices into \l/ using the semi-standard tableau: 

























1 


1 




1 












n-1 


n-1 




n-1 



5.3.2 Examples 

We will consider Markov invariants of degree d = 1 only. For the case of 
n—2, m = 3, Schur shows that 

(x 3 {l})®« = x 3 {l} 3 2{21}, 

which implies that there are two Markov invariants corresponding to {21} with 
k = s = l. 

There are two standard tableaux of shape {2, 1}: 



1 


2 




1 


3 


3 




2 





The corresponding d—1 Markov invariant follows from computing 



1 2 ' 3 
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and then inserting indices according to the single semi- standard tableau: 









1 





For the first tableau we compute the symmetrized tensor 

^010203 = ^axa^ao, + ^a^aiaz ~ Ipazavai ~ ^'020301 ■ 

The single independent component gives the Markov invariant 

*2J = 2^001 - ^100 - ^oio- 

The second tableau gives the symmetrized tensor 

^010203 ^axaiaz 4" l^azaiai 1^020103 l^azaiaz- 

The single independent component gives the second Markov invariant 



(2) 
010 



2^010 - ^ 



100 



001- 



As a second example, consider the case n = 2,m = 4, with Schur giving 

(x 4 {l}f « = x 4 {l} 3 3{31}, 

so there are three Markov invariants with k — s — 1. There are three standard 
tableaux and hence three candidate Young operators: 



2 3 



2 4 



3 4 



The associated semi-standard tableau is 












1 





For the first tableau we have the symmetrized tensor: 

^01020304 ^111(22113(24 ^(22(21(23(14 13(22(11(24 1^0,10,30,20.4 

1^0,301020,4 (23 (21 (24 ^04020301 1^02040301 

1^03020401 1^04030201 -1P03 a4(l2 a l V ? d2 a 3 ( l4 ( ll ' 

By inserting the indices we get the Markov invariant 
6^0001 - 2^1000 - 2^0100 - 2^ooio- 

And by analogy for the remaining two Young operators (with the same semi- 
standard tableau) we have the Markov invariants 



6^0010 - 2^1000 - 2^0100 - 2^0001, 
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and 

6^0100 - 2^1000 - 2-0ooio - 2"0 O ooi- 

Our final example is the case n = 3,m = 4, with Schur giving 

(x^l})® 1 = x 4 {l} 3 3{21 2 }, 

so there are two Markov invariants with k — s — 1. Again, there are three 
standard tableaux 



1 


2 


3 




4 





with associated semi- standard tableau 









1 




2 





From the first standard tableau we compute the symmetrized tensor: 

^01020304 ll ! aia2a: i <n ~\~ "0O2OiO3O4 0O3O2OiO4 '02030104 "004020301 
"002040301 "001020403 "002010403 r ^a^a2a\a:i 
+ ^03 020401 ^02030401 • 

Again by filling the indices according to the semi-standard tableau we get the 
Markov invariant 

*0012 = 2^0012 - ^1002 - ^0102 - 02oio - ^0210 - 2^0021 

+ ^2001 + ^0201 + ^1020 + ^0120- 

Similarly we find for the remaining two standard tableaux: 



(2) 
0102 



"00012 + ^0021 + 2^0102 - ^0120 ~ 2^0201 

+ ^0210 - ^1002 + ^1200 + ^2001 ~ 02100 



and 



(3) 

0120 = "00012 — "00021 — "00102 + 2"0O12O + "00201 ~~ 2"0 O 21O 
- 01020 + ^1200 + ^2010 - 02100- 



These three invariants are linearly independent, as required. 
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5.4 Markov invariants of x m M(n) on V® m 

We now consider invariants of the group action x m J\A(n) on V xm given by 
ip -> Mi ® M 2 <g> . . . <g> M m ^; M a G .M(n), 1 < a < m. 

According to Conj ecture 15.1.21 there exists a Markov invariant, w, of degree d 
of this group action if 

* m {k + s, k 11 " 1 } 3 {d}, 
for some nk + s = d. These Markov invariants will satisfy 

w{M 1 ® M 2 ® . . . <g) M m ^) = (det(Afi) det(M 2 ) . . . det(M m )) fc u;(» 

for all ^ G V® m ,VM a G 1 < a < m. The inner product multiplications 

computed for various cases by Schur are given in Table 15.11 
The Markov invariants can then be computed from 

v&. . — Y {2) V (m) ib- ■ ?/;■ • ib- 

*n—1>dm ■ 1 {k + S,!:™- 1 } {fc + S.fc™- 1 } ' ' ' {fc + s,fc n - 1 }^ i l---*™T yj m+l---«2m • • • r«( n _l) m ...«dm' 

where each Young operator Y^+sk"- 1 ^ ^ — a — m ' ^ s g eriera ted from a 
standard tableau of shape {k + s, fc n_1 } with integers chosen from the set 
{a, m + a, . . . , (d — l)m + a}. The final step is to insert indices into ^ using 
the semi-standard tableau: 

























1 


1 




1 












n-1 


n-1 




n-1 



Again, the correct set of standard tableaux needed to generate a particular 
invariant is not certain, and we proceed by computing for different cases and 
checking for algebraic dependence until we get the correct number of alge- 
braically independent invariants. 

In what follows, we will adopt a notation where a Young operator correspond- 
ing to a certain tableau is written as V aija2i ... ; f )li f )2v .. ;Clv .., where the commas 
separate column entries in the tableau and semi-colons separate the rows. 



n 


2 


2 


3 


3 


4 


4 


m 


{21} 


{31} 


{21 2 } 


{31 2 } 


{21 3 } 


{31 3 } 


2 


1 


1 


1 


1 


1 


1 


3 


1 


1 


1 


1 





1 


4 


3 


4 


4 


13 


4 


16 


5 


5 


10 


10 


61 


6 


137 


6 


11 


31 


31 


397 


40 


1396 



Table 5.1: Occurrences of {d} in * m {k + s, k n x } with nk + s = d 
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5.4.1 The stochastic invariant 

For the group action of x m Ai(n) there is always what is known as the degree 
d—1 stochastic invariant, for all m,n given by: 

$ : = uj(® m e). 

This corresponds to the trivial inner product multiplication 

*-{!} = {!}, 

with k = 0, s = 1. Evaluated on any tensor ip e y® m the stochastic invariant 
is simply the sum of the tensor components: 



ll,l2,...,lTn 

In particular, evaluated on a phylogenetic tensor P: 

$(P)= ^ p ili2 ... iro = 1, 



ll,l2,...,l m 



which motivates the terminology. 
5.4.2 The n = 2 case 



From Table 15.11 we see that for m = 2 there is a single Markov invariant for 
each of d = 3 and d = 4. These can be generated by simply taking pointwise 
products of the stochastic invariant with the general linear group invariant Di 
(EH): 



$ 2 • L> 2 . 



For m = 3 there is a Markov invariant generated from {21}. We coin this 
invariant the stangle (stochastic tangle). By directed trial and error with 
various tableaux, this invariant was found by taking the composition of the 
three Young tableaux: 



1 


7 


4 





3 


9 


6 





This is written in our new notation as 



*l*2*3 i 4*5 J 6 i 7«8 J 9 



(5.5) 



and we find that the stangle is 

T 2 S = ^ooomooo = -2^001^010^100 + ^ooo^on^ioo + ^oooV'oio^ 

+ ^ooo^ooi^no - V'oooV'iii- 



101 
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For m = 4 there are three Markov invariants which we call the squangles 
(stochastic quangles). One of these Markov invariants can be generated simply 
by taking the pointwise product of the quangle multiplied by the stochastic 
invariant: 

By directed trial and error the other two squangles have been found to be 
generated from 

^l,5;9^2,6;10^3,ll;7^4,12;8^jii2«3M^5j6j7«8 ? / ; j9U0Uin2 (5-6) 

and 

10*11212 ■ 

Explicitly the first squangle is 

Q2 = V'oonV'oiooV'iooo + V'ooioV'oioi^iooo + V'oooiV'oiioV'iooo - V'ooooV'oiiiV'iooo 
+ ^0010^0100^1001 + ^0001^0100^1010 - ^oooo^oiooV'ioii - 2 ^oooiV'ooioV'iioo 
+ 3 ^ooooT/Wi^noo - V'ooooV'ooioV'1101 - ^ooooV'oooiV'iiio + V'oooo ^1111) 

and the second 

Q7 = V'OOll ^0100^1000 - 2^0010^0101^1000 + ^0001^0110^1000 - ^0000^0111^1000 

+ ^ooio^'oiooV'iooi - 2^0001^0100^1010 + SV'ooooV'oioi^ioio - V'ooooV'oioo^ioii 
+ ^0001^0010^1100 - V'ooooV'ooio^iioi - ^oooo ^oooi V'mo + V'ooooV'iiii- 

The three degree d = 3 Markov invariants {$ • Q2, Q2 1 , Q2 2 } have been shown 
by explicit computation to be linearly independent, as required. 

5.4.3 The n = 3 case 

From Table 15.11 there are two Markov invariants for n = 3, m = 2 of degree 
d — 4,5. Again these invariants can be easily produced by taking products of 
the stochastic invariant with the determinant invariant (12.131) : 

$ • D 3 , $ 2 ■ D 3 . 

In the case m = 3 there is a single Markov invariant, which we also refer to as 
the stangle: 



21«2«3*4«5«6«7«8«9U0«11U2 

:= ^l,4;7;lO^2,8;5;11^3,12;6,9%ii2j3^^4^5^6'0i7i8i9' ? / ; iioHiji2 5 



(5.7) 
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so that 

J 3 — ^000011102220 

= ^012^020^101^200 - ^010^022^101^200 ~ ^011^020^102^200 
+ ^010^021^102^200 - ^002^021^110^200 + ^001^022^110^200 
+ ^002^011^120^200 - ^001^012^120^200 ~ ^012^020^100^201 
+ ^010^022^100^201 + ^002^020^110^201 ~ ^000^022^110^201 

- ^002^010^120^201 + ^000^012^120^201 + ^011^020^100^202 

- ^010^021^100^202 - ^001^020^110^202 + ^000^021^110^202 
+ ^001^010^120^202 - ^000^011^120^202 + ^002^021^100^210 

- ^001^022^100^210 - ^002^020^101^210 + ^000^022^101^210 
+ ^001^020^102^210 - ^000^021^102^210 ~ ^002^011^100^220 
+ ^001^012^100^220 + ^002^010^101^220 ~ ^000^012^101^220 

- ^001^010^102^220 + ^000^011^102^220- 

In the case of m — 4, Table 15.11 predicts four Markov invariants, which we 
again refer to as squangles. One of the squangles can be inferred directly as 
the pointwise product: 

and by directed trial and error we have shown that the other three can be 
generated from the Young operators: 

Q3 1 *~ ^1 ) 5;9;13^2 ) 6;10;14^3,7;11;15^4,8;12;16) 
QT ^1,9;5;13^2,14;6;10^3,7;11;15^4,8;12;16) 

(5.8) 

Ql 3 H,9;5;13^2,10;6;14^3,7;ll;15^4,8;12;16) 

where <— indicates the implementation of our procedure with the indices of ^ 
filled out to create the only semi-standard tableau of shape {21 2 } using the 
integers {0, 1, 2}. The four invariants {$ • Q 3 , Q3 1 , Q3 2 , Q3 3 } have been shown 
by explicit computation to be linearly independent. 

5.4.4 The n = 4 case 

In the case of n = 4, m = 2, Table 15.11 predicts a Markov invariant of degree 
d = 5,6. Again, these invariants can be generated easily as the pointwise 
products: 

$ ■ D 4 , $ 2 • D A . 

In the case of m = 3 Table IBTTI predicts a degree d = 6 Markov invariant which 
we again refer to as the stangle. It is generated from the Young operator 

Xt <— ^1,4,13;7,10,16^2,8,17;5,11,14^3,12,18;6,9,15- (5.9) 
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Explicitly this polynomial has 1404 terms. 

In the case of m = 4 there are four degree d = 5 Markov invariants which we 
again refer to as squangles. One of these is generated easily as 

and by directed trial and error the other three have been found to be given by 
the Young operators: 

QT <— H,5;9;13;17^2,6;10;14;18^3,7;ll;15;19^4,8;12;16;20, 

QT *~ ^ / 1,9;5;13;17^2,14;6;10;18^3,7;11;15;19^ / 4,8;12;16;20) (5.10) 
QT *~ ^ / l,9;5;13;17^2,14;6;10;18^3,19;7;ll;15^4,8;12;16;20- 

The four degree d — 5 Markov invariants {$ ■ Q4, Ql 1 , Ql 2 , Ql 3 } have been 
shown by explicit computation to be linearly independent, as required. 



5.5 What happens on a phylogenetic tree? 

In this section we will examine the structure of the invariant functions we have 
discovered on phylogenetic trees. We will focus on the case of four characters 
n = 4 and three and four leaves m = 3,4. 
We have discovered invariant functions which satisfy 

w{gif>) = det(g) k w(ij), 

for all g G x m Ai(n) and if) G V® m . If we consider the case where these 
invariants are evaluated on the phylogenetic tensor P, the invariant takes the 
form 

m 

w(P) = l[det{M a ) k w(P). 

a=l 

Our task is to examine the structure of the Markov invariants when evaluated 
on the phylogenetic tensor P corresponding to the various possible trees. 

5.5.1 The stangle 

As we saw in Chapter HJ we need only consider unrooted phylogenetic trees. 
For the case of three taxa the most general phylogenetic tree is: 
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The corresponding phylogenetic tensor can be expressed as 
p = (Mi <g> M 2 <g> M 3 )l g) 6 ■ 5 • 7r, 

where 

S 2 ■= 1 <g) <5 • 5 = 5 <g> 1 • 5. 
From the general properties of the Markov invariants we find that 

T (s) (P) = det(M 1 )det(M 2 )det(M 3 )T (s) (P), 
and by direct computation 

T (s) (P) = 0. 

It follows that evaluating the stangle on the general phylogenetic tensor of four 
leaves satisfies 

T (s) (P) = 0. 

This equation is independent of all the model parameters contained in the phy- 
logenetic tree. This observation implies that this Markov invariant also sat- 
isfies the properties of a phylogenetic invariant for the general Markov model 

PJ. ' 
5.5.2 The squangles 

For the case of four taxa there are three inequivalent unrooted phylogenetic 
trees as presented in Figure 15.11 The corresponding phylogenetic tensors are 

• pW = Mi ® M 2 ® M 3 ® M 4 (l <g> 1 <g> 5M 5 )5 2 tt 

• p(2) = Mi <g> M 2 ® M 3 ® M 4 (l <g> <5M 5 (8) l)5 2 vr 

• P( 3 ) = Mi <g> M 2 ® M 3 ® M 4 (5M 5 ® 1 ® l)5 2 vr. 

For any linear combination of the Markov invariants: 
w = c$ • Q 4 + ciQl 1 + caQ? + c 3 Ql 3 

we have 

w(P (1) ) = det(Mx) det(M 2 ) det(M 3 ) det(M 4 )w(P (a) ), a = 1, 2, 3. 
Defining the linearly independent combinations 

£l = — \Q S A + Q 4 2 + 2Q 4 3 ; 
-^2 = — \Q S A + 2Q4 2 + Q4 3 , 

L3 = —Q S 4 + Q 4 3 . 

it is possible to show by direct computation that the following relations hold: 
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Figure 5.1: Three alternative quartet trees 



. I*{pM) = 0, La(pW) = -L 3 (P (1) ) > 0; 
. L 2 (P( 2 )) = 0, ^(PW) = L 3 (P«) > 0; 
. L 3 (P^) = 0, L X (P«) = L 2 (P«) < 0. 

This implies that these linear combinations of the squangles are not only 
Markov invariants, but also phylogenetic invariants [lj. They are actually phy- 
logenetically informative invariants because they can be used to distinguish 
between the three quartet topologies. Studying the statistical properties of 
this technique is a topic of ongoing work (see Appendix [Al. 



5.6 Review of important invariants 

We tabulate the invariant functions that have been of interest in this thesis in 
Table 15.21 It should be noted that in the case of the squangles the invariants 
of the general linear group are included with the invariants of the Markov 
semigroup. 



5.7 Closing remarks 

In this chapter we have defined and proved the existence of Markov invari- 
ants. We have shown how to derive their explicit polynomial form in inter- 
esting cases. We examined the structure of several invariants in the context 
of phylogenetic trees. Finally, we derived a novel technique of quartet tree 
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Table 5.2: Invariant functions satisfying / o g = det(g) h f 



reconstruction which is valid under the assumptions of the general Markov 
model of sequence evolution. 



Chapter 6 



Conclusion 



In this thesis we have examined the mathematical analogy between quantum 
physics and the Markov model of a phylogenetic tree. 

In Chapter [2] we gave a review of group representation theory, established the 
Schur/Weyl duality and went on to show how one- dimensional representations 
and invariant functions of the general linear group can be put into coincidence. 
We also presented several examples of the explicit polynomial form of these 
invariants. 

In Chapter [3] we concretely established the mathematical analogy between en- 
tanglement and that of phylogenetic relation. We showed that group invariant 
functions can be used to quantify a measure of phylogenetic relation. 
In Chapter H] we gave a review of pairwise phylogenetic distance measures and 
examined the use of the tangle in improving the calculation of pairwise dis- 
tance measures from observed sequence data. 

In Chapter [5] we defined and showed how to derive Markov invariant functions. 
We studied their properties in cases relevant to the problem of phylogenetic 
tree reconstruction. We derived a new technique for reconstruction of quartets 
which is valid under the assumptions of a general Markov model. 



Future investigations 

There are several clear paths for continuing the work that has been presented 
in this thesis. 

Rather than use the tangle to give improved pairwise distances it seems ju- 
dicious to examine how the tangle could be used in more direct ways. The 
Neighbour- Joining (NJ) algorithm for tree reconstruction has at its core the 
concept of pairwise distances and in opposition to this the tangle polynomial 
actually gives a measure of the sum of the branch lengths for a triplet. Hence 
it seems that one possibility is to generalize the NJ algorithm in such a way 
that the tangle is incorporated explicitly into the procedure. Additionally, bi- 
ologists are interested in the evolutionary distance between taxa and another 
possibility would be to use the tangle as a measure of the evolutionary distance 
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between triplets of taxa without decomposing this distance into pairs. Given 
a set of multiple taxa one could construct interesting questions comparing dif- 
ferent triplets using the value of the tangle as a quantifier. 
The stochastic tangle is a very interesting mathematical object as it simultane- 
ously satisfies the properties of a Markov invariant and that of a phylogenetic 
invariant. In this thesis we have not investigated the potential of finding a 
practical role for the stochastic tangle in the problem of phylogenetic recon- 
struction. The possibilities of practical roles are similar to that of the tangle 
and we leave this as an open problem. 

The squangles have been shown to give a new tree reconstruction algorithm 
for the case of quartets. The main path for future investigation is to study 
the statistical properties of such an algorithm. It is theoretically clear how to 
calculate unbiased forms of the squangles (see Appendix |A} and this would be 
a desirable practical outcome as it will improve the performance of the quartet 
reconstruction in the case where the sequence data is of relatively short length. 
Unfortunately this calculation of an unbiased form is computationally difficult 
and has not been achieved. To further the complete statistical understanding 
it is necessary to calculate the variance of the squangles. Again this is the- 
oretically clear but computationally difficult as one is required to square the 
polynomials. 

In this thesis we have used the concept of a tree in a rather ad hoc way. 
Our procedure was to compute the explicit polynomial form of the invariant 
functions and then to impose a given tree structure onto the polynomial by 
choosing coordinates for the tensors selected to be consistent with the tree. 
Given that the existence of the invariant functions was proved using the Schur 
functions series, a natural corollary would be to ask if it is possible to iden- 
tify the relationships between the invariant functions that occur on particular 
trees by simply studying the properties of the Schur functions in more detail. 
The branching operator 5 is technically an invertible linear operator on the 
expanded linear space known as a Fock space and it follows that the character 
theory of this action together with that of the Markov semigroup should intro- 
duce the possibility of "seeing" the tree structure within the Schur functions. 
Hence it seems feasible to identify the relationships between the invariant func- 
tions that occur on particular trees by simply studying the properties of the 
Schur functions in more detail. 

The other clear course for theoretical investigation is to completely classify 
the ring of invariants for the Markov semigroup. This is not an easy problem 
as the Hilbert basis theorem states that the ring of invariants is guaranteed to 
be finitely generated if the group action is completely reducible [36] . However, 
the Markov group has an invariant subspace with no complementary invariant 
subspace and is hence not completely reducible. Further study is required to 
fully characterize the ring of Markov invariants. Additionally, the exact con- 
nection between the ring of Markov invariants and the ideal of phylogenetic 
invariants should be established concretely. In this thesis this connection was 
only made for the particular cases that were of interest. A well defined and 
complete description of the connection is required before one can speak with 
confidence on this matter. 



Appendix A 
Bias correction of invariant functions 



A.l Multinomial distribution 



Let X a , 1 < a < n, be the random variable which counts the occurrences of 
character a in a finite subset of an infinite sequence consisting of the characters 
{1,2, ...,n}. If each character occurs with probability p a , then for a subset of 
length N we have the standard multinomial distribution 

F(X 1 = h,X 2 = k 2 ,...,X n = k n ) = Nl . V ^v\ 2 -V k n- (A.l) 

k x \k 2 \...k n \ 

Defining the vector valued random variable X = (Xi, X 2 , X n ) G N n , we 
can express ( 1A.1I) as 

AT! n 

= k) = =^-\{ P k b \ 

lla=l h a,- 6=1 

with k = (fci,..., k n ) G N n and k\ + k 2 + ... + k n = N. Consider any function 

. c n -> C q , qeN. 
The expectation value of 4>(X) is then defined as 
E[<j>{X)] = P ( X = 

kGW.ki+k 2 +...+k n =N 

A. 2 Generating function 

For every s G K n we define the generating function G : M n — > C as 

G(s) = E[e i{s > x \ 

where we have considered X G W 1 C W 1 and (s, X) = si-Xi + s 2 X 2 + ... + s n X n 
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and convergence is ensured by |e^ s ' x ^| = 1 and the triangle inequality. 
Observe that 



In particular we have 

dG(s) 



=o = iE[X j ]. 



We simplify notation by taking the Laplace transform 

s — > is, 

and find that in general 



-_o = E[X h a \X%...X b ai 



ds b a \ds b a l...ds b a: 

Computing a closed form of G(s) follows easily given the identity 



(xi + x 2 + ... + x n ) N = ^2 



N\ 



rfMi k2 k n 
•^1 Jj 2 ■■■• h n i 



k X \k 2 \...k n \ 

keN n :k 1 +k 2 +...+k n =N 1 z n 

so that 

G(s) = ( Pl e si +p 2 e S2 + ... + Pn e Sn ) N . 
In particular G(0) = 1. 

A. 3 Expectations of polynomials 

We are particularly interested in the case when 

e C[V] d , V c n . 

In general we have 

E^fa + c0 2 )(X)] = E[MX)] + cE[4> 2 {X% 

but 

E[0 1 -0 2 (X)]^E[0 1 (X)] J E[0 2 (X)]. 

Thus in order to calculate the expected value of a polynomial we need 
study expectation values of monomials: 

E[X*X*..X%], m<n. 
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In particular we have 
E[X a ] 



E[X a X b 



E [X a X b X c 



dG(s) 
ds a 

Np a , 

d 2 G(s) 
ds a ds b 

N(N - l) PaPb + N Pa S ab 



s=0 

(A.2) 



d 3 G{s) 

\s=0 



ds a ds b ds c 
=N{N-l){N-2) PaPbPc 

+ N(N - l)( PaP b5ac + PaPJab + PbPc$ab) + N Pa 5 ab 5 ac , 

and for a set of distinct integers 1 < a%, 0,2, dd < n } we have 

E[X ai X a2 ..X ad \ = — _ d y P ai Pa 2 --Pa m - (A.3) 



A. 4 Bias correction 

For a given homogeneous polynomial of degree d, we would like to find a 
polynomial <p such that 

E$(X)]=<t>( P ). 
We refer to as the unbiased form of 0. 

By looking at the general form of the invariants det ra it can be seen that every 
monomial term is of the form (IA.3I) . It follows easily that 

N\ 

E[det n (X)] = det w (p), 

so that the unbiased version is given simply by 

- — (N-d)\, 
det n := — — det n . 

It should be noted that this says nothing about what to do about finding 

an unbiased form of logdet, because the log function is not polynomial. For 

discussion on the bias correction of the log det function see [5] . 

We leave the computation of unbiased forms of the other invariants presented 

in this thesis as an open problem. However, the process is exemplified in the 

following. 

Consider the expectation: 



E[X X X 2 X^ = N(N -1)(N- 2) PlPm . 
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Thus the unbiased form of this monomial is simply 
Consider 

E[X*X 2 ] = N(N -1){N- 2)pjp 2 + N(N - l) Pl p 2 . 
The unbiased form of this monomial is then 



since 



(X 1 X 2 — XiX 2 ), 



E[ ^w^ {x * X2 - XlX2)] = P * P2 - 

By generalizing (1A.2j) for a set of distinct integers 1 < a, 6 1( b 2 , b m < n it 
follows that 

-E[(JV-(m ! +l))! ( X a X b 1 X b2 ...X bm - X a X bl X b2 . . X bm )] = pfybiPb? ■ ■ -Pb m ■ 

This is the first step to computing the unbiased form of general monomials. 
Clearly the process becomes more complicated as the degree of a given random 
variable within each monomial becomes larger. 
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